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I .   INTRODUCTION 

The  Naval  Space  Surveillance  Center  (NAVSPASUR)  currently 
tracks  daily  over  6000  objects  in  elliptical  orbits  around  the 
Earth.  To  assist  in  identification  and  tracking  of  these 
objects  in  orbit,  NAVSPASUR  uses  an  analytic  satellite  motion 
model  implemented  in  the  Fortran  subroutine,  PPT2 .  This 
subroutine  predicts  an  artificial  satellite's  position  and 
velocity  vectors  at  a  selected  time  to  aid  in  the  tracking 
endeavor.  Several  calls  to  the  subroutine  may  be  required  to 
aid  in  the  identification  of  one  object. 

With  the  current  increase  in  space  operations,  the  number 
of  objects  necessary  to  be  tracked  is  expected  to  increase 
substantially.  Subroutine  PPT2  provides  orbit  prediction 
within  an  adequate  response  time  for  the  current  number  of 
tracked  objects  in  space.  However,  a  substantial  increase  in 
the  number  of  objects  will  cause  the  use  of  PPT2  on  a  serial 
computer  to  become  less  responsive  and  computationally 
burdensome.  Additionally,  if  there  exists  a  desire  to 
increase  the  accuracy  of  the  NAVSPASUR  model,  the  resulting 
subroutine  would  require  even  more  computing  resources  and 
make  achieving  results  even  more  time  consuming. 

Parallel  computing  offers  one  option  to  decrease  the 
computation  time  and  achieve  more  real-time  results .  Use  of 
parallel  computers  has  already  proven  to  be  beneficial  in 


reducing  computation  time  in  many  other  applied  areas. 
Parallel  computing  offers  an  opportunity  to  both  increase  the 
efficiency  of  the  current  model  or  reduce  the  computational 
burden  of  a  more  accurate  future  model. 

The  ultimate  objective  of  this  thesis  is  to  quantitatively 
determine  the  parallel  computing  potential  of  the  current 
NAVSPASUR  analytic  model  and  determine  the  subsequent 
reduction  in  computer  time  if  the  model  is  applied  to  a 
hypercube  multicomputor .  The  following  chapter  provides  a 
description  of  the  NAVSPASUR  satellite  model  and  outlines  the 
algorithm  used  by  the  Fortran  subroutine,  PPT2 .  Chapter  III 
provides  an  overview  of  parallel  processing  and  discusses  the 
methods  to  decompose  a  serial  algorithm  to  be  applied  to  the 
hypercube .  In  Chapter  IV,  the  two  methods  of  decomposing  the 
analytic  model  are  presented  with  their  respective  success  in 
reducing  computation  time.  The  last  chapter  of  this  thesis 
provides  conclusions  and  suggestions  for  future  research. 


II.   NAVSPASUR  SATELLITE  MOTION  MODEL 

A.   OVERVIEW 

The  satellite  motion  model,  adopted  by  NAVSPASUR  and 
implemented  in  subroutine  PPT2,  is  a  general  perturbations 
variation  of  elements  model  of  artificial  satellite  motion 
around  the  Earth.  Given  a  set  of  a  satellite's  "mean"  orbital 
elements  at  a  given  epoch,  the  model  predicts  the  state 
(position  and  velocity)  vector  at  a  future  time.  The  model 
considers  perturbing  accelerations  caused  by  atmospheric  drag, 
oblateness  of  the  Earth,  and  asymmetry  of  the  Earth' s  mass 
about  the  equatorial  plane.  This  model  ignores  perturbations 
due  to  longitudinal  variation  in  the  Earth' s  gravitational 
potential  and  the  influence  of  other  celestial  bodies  such  as 
the  Moon  or  the  Sun . x 

Satellite  motion  models  can  be  classified  by  the  technique 
used  to  integrate  a  satellite's  equations  of  motion  and  the 
method  to  describe  the  variation  of  the  satellite's  orbit  in 
reaction  to  the  perturbing  forces.  The  two  primary  techniques 
to  solve  satellite's  equations  of  motion  are  general 
perturbations   and  special  perturbations .  General  Perturbation 


Although  the  NAVSPASUR  model,  implemented  by  PPT2, 
neglects  the  longitudinal  variation  to  the  Earth' s 
gravitational  potential,  a  correction  for  this  variation  can 
be  made  within  PPT2  by  a  call  to  a  second  subroutine,  LUNAR. 


techniques  involve  an  analytic  integration  of  the  perturbing 
accelerations,  while  special  perturbation  techniques  involve 
the  direct  numerical  integration  of  the  equations  of  motion  to 
include  all  perturbing  accelerations.  Typically,  general 
perturbation  techniques  are  more  difficult  and  not  as  accurate 
as  special  perturbation  techniques.  However,  special 
perturbation  techniques,  in  general,  provide  this  increase  in 
accuracy  at  a  cost  of  several  orders  of  magnitude  of  computing 
resources  (Bate,  1971,  pp.  385  -  414)  .  A  third  technique,  now 
gaining  in  popularity,  is  a  semi-analytic  technique.  This 
technique  is  combination  of  both  the  general  and  special 
perturbation  techniques.  The  NAVSPASUR  model,  a  general 
perturbations  model,  solves  the  satellite's  equations  of 
motion  by  using  series  solutions  to  the  ordinary  differential 
equations . 

Within  these  techniques  exists  two  methods  to  describe  the 
variations  to  a  satellite's  orbit.  One  method,  variation  of 
elements,  describes  variations  to  the  orbit  in  terms  of 
changes  in  the  osculating  orbital  elements  with  respect  to 
time.  The  other  method,  variation  of  coordinates,  selects  a 
coordinate  system  and  describes  variations  to  position  and 
velocity  in  this  coordinate  system  with  respect  to  time.  The 
disadvantage  of  the  variation  of  coordinates  method  is  that 
the  solution  provides  no  immediate  insight  to  the  geometry  of 
the  orbit  (Danby,  1989,  p.  319)  .  Using  the  variation  of 
elements  method,  the  NAVSPASUR  model  describes  the  variations 


to  an  orbit  in  terms  of  changes  to  the  classical  orbital 
elements  with  respect  to  time. 

B .   THEORY 

The  NAVSPASUR  model  is  based  on  a  theory  developed  in  195  9 
by  Dirk  Brouwer  of  Yale  University  (Brouwer,  1959,  pp.  378  - 
3  97)  and  modified  by  R.  L.  Lyddane  of  the  U.  S.  Naval  Weapons 
Laboratory  in  1963  (Lyddane,  1963,  pp.  555  -  558)  .  This 
theory  considers  an  Earth' s  gravitational  potential 
significantly  more  involved  than  the  gravitational  potential 
used  in  the  classical  Kepler  model  of  idealized  satellite 
motion.  The  classical  model  assumes  a  perfectly  spherical 
Earth  and  the  gravitational  potential  may  be  expressed  by 

U=  _E  (2.1) 

r 

where  |l  is  the  gravitational  parameter  and  r  is  the  radial 
distance  of  the  satellite  from  the  center  of  a  spherical  Earth 
(Bate,  1971,  pp.  11  -  16).  The  theory  used  in  the  NAVSPASUR 
model  assumes  only  that  Earth  is  symmetrical  about  the  north- 
south  axis.  Expressing  the  potential  in  spherical  harmonics, 
the  Earth' s  gravitational  potential  is  modeled  by 

C7=ii+Ji£:^J>n*(sinp)  [Cn,qcosg^S„fqsingM     (2.2) 

where  R$  is  the  equatorial  radius  of  the  Earth,  (3  is  the 
satellite  latitude,  X     is  the  longitude,   C„fq  and  Sn  q  are 


coefficients  depending  on  the  mass  distribution,  and  Pnq  are 
the  associated  Legendre  polynomials.  These  polynomials  are 
defined  in  terms  of  the  Legendre  polynomials  Pn: 

P0(x)=l 
Px{x)  =x 

Pn  U>  =2Z±xPn.x  <*>  ~^Pn-2  U)  (2  .  3) 

n  n 

PZ(x)  =  (l-x2)*/2J?lPn(x) 
dxq 

Ignoring  any  longitudinal  variation  in  the  gravitational 

potential,  Equation  2.2  may  be  simplified  to 


U-l-lf  ^.JnPn  <sin(3)  (2.4) 

*  *■ n=i  r 


where  Jn=-Cnq.   The  even  Jn'  s  account  for  the  oblateness  of  the 
Earth,  while  the  odd  Jn'  s  account  for  the  Earth' s  asymmetry 
about  the  equatorial  axis.  (Solomon,  1991,  p.  3) 
1.   Brouwer's  Model 

Dirk  Brouwer  developed  his  theory  of  artificial 
satellite  motion  while  under  contract  by  the  Air  Force 
Cambridge  Research  Center  and  published  this  theory  in  the 
Astronomical  Journal  in  1959.  In  this  article,  Brouwer  used 
a  different  notation  for  the  classical  orbital  elements  from 
the  notation  commonly  recognized  today.  This  notation  is  also 
adopted  in  the  later  NAVSPASUR  model .  In  order  to  conform  to 
the  notation  of  Brouwer's  original  article  and  NAVSPASUR 
model,  this  paper  will  also  use  Brouwer's  notation  listed  in 
Table  2.1. 


Table  2.1  Brouwer's  Notation 


Brouwer ' s 

Notation 

Common 

Notation 

a 

semi-major  axis 

a 

e 

eccentricity 

e 

I 

inclination 

i 

g 

argument  of  peri 

gee 

CO 

h 

ascending  node 

n 

f 

true  anomaly 

V 

1 

mean  anomaly 

M 

Brouwer' s  model  considers  the  zonal  harmonics  of  the 
Earth's  gravitational  potential,  accounting  for  the  Earth's 
oblateness  and  its  asymmetry  about  the  equatorial  axis  as 
expressed  in  Equation  2.4.  To  simplify  the  potential 
equation,  his  model  uses  only  the  first  four  non-zero  terms  of 
the  series  described  in  Equation  2 . 3  with  J^O : 

U=E+}^1  (l-3sin2(3)  -^3  °  (3sinP-5sin3P) 

r      r3  2r4 

+  il^i  (3-30sin2P  +  35sin4(3)  (2.5) 

3r5 

+  ^As-°  (15sinB-70sin3J3  +  63sin5(3) 
8r6 

where 
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Figure  2.1  Classical  Orbital  Elements 


k2  -  —  J2Eq 
A30=-J3E® 

This  truncation  of  the  series  introduces  an  error  of  the  order 
0(k23),  where  k2=  .  4841605-  10"3- 0  .  5^5  .  (Brouwer,  1963,  p.  393). 
In  order  to  derive  the  equations  of  motion  for  the 
satellite,  Brouwer  utilized  the  Hamilton-Jacobi  theory  of 
dynamical  systems  to  express  the  Hamiltonian  F=U-:4v2  in  terms 
of  canonically  conjugate  Delaunay  variables .  Letting  a  and  e 
be  the  osculating  semi-major  axis  and  eccentricity, 
respectively,  the  Delaunay  variables  are: 


I/=V|ia  l=mean  anomaly 

G=vVa  (1-e2)  g=argument  of  perigee  (2-6) 


H=v|ia  (1-e2)  cos  I         h=longitude  of  ascending  node 
Using  the  Delaunay  variables  in  Equations  2.6,  the  equations 


of  motion  become 


dL^dF  dl=_dF 

~dt   HI  '  ~3t     1Z 

dG=  dF  dg __  dF 

~dt    1g   '  HE      !JG 

dh=dF  dh=_dF 

~dt    IE   '  ~dt       1h 


(2.7) 


where   the   hamiltonian    is 


m2     \lAk2  1    3^2      3    3       H2      3 

F=-!_+- ![(-_+ ) — +  _(1- — ) — cos  (2g+2f)  1   /o  o\ 

2L2       L6     2  2G2  r3  2     G2  r3  (28) 


In  order  to  express  the  hamiltonian,  F,  in  terms  of  only  the 
Delaunay  variables,  Brouwer  used  the  following  Fourier  series 
expansions : 

^-^+i2pposji 

-iIcos(2g+2f)  =  £  Q.,cos(2g+jl) 

r  j=_ 

The  coefficients  Pj,  Qj  are  power  series  in  the  eccentricity, 
e . 

Using  two  canonical  transformations  through  the  choice 
of  suitable  determining  functions,  S  and  S'  ,  Brouwer  was  able 
to  solve  the  system  of  ordinary  differential  equations  listed 
in  Equations  2.7  in  terms  of  the  mean  elements,  a",  e",  I", 
1",  g",  and  h"  .   The  transformed  hamiltonian,  F**,  depends  only 
on  the  transformed  variables  L",  G",  and  H" .   Replacing  F  by 
F**  in  Equation  Set  2.7,  Brouwer  found  that  L",  G",  and  H"  are 
constants  with  respect  to  time  and  1",  g",  and  h"  are  linear 
functions  of  time.   Consequently,  from  Equation  Set  2.6,  it 
follows  that  a",  e",  and  I"  are  constant  with  respect  to  time. 
Additionally,  as  a  consequence  of  the  transformations  (see 
Brouwer,  1959,  pp.  379  -  393),  Brouwer  was  able  to  separate 
the  changes  in  the  orbital  elements  with  time  into  secular, 
long  period,  and  short  period  variations .    Including  only 
secular  terms  up  to  order  0(k22)  and  periodic  terms  to  order 

10 


0(k2),  Brouwer  found  that  the  secular  variations  are  a 
function  of  only  a",  e",  I",  and  t;  the  long  period  variations 
are  a  function  of  a",  e" ,  I",  and  g";  and  the  short  period 
variations  are  a  function  of  all  six  mean  elements . 
Additionally,  a,  e,  and  I  do  not  experience  any  secular 
variations  to  order  0  (k22)  and  a  does  not  experience  any  long 
period  variation  to  order  0(k2)  .  Using  the  following 
constants, 

a" =semi-major  axis  constant 
e" =eccentricity  constant 
l"=inclination  constant 

n0  =^a"3 

jR$  =Earth' s  equatorial  radius 


and  notations, 


"H^l-e2  6=cosJ 


ii 


2  3 . 0  4  . 


V    =        ~*  •  **  V    —  V    = 

-.112  '3        _//3  f4        _//4  '5        _//5 

a  a  a  a 


Y/2=Y21T4  Y/3=Y3"n"6   Y/4=Y4T1"8  Y/5=Y5T1"10 
Brouwer' s    formulas    for   computing   the   perturbations    are:2 


2In  order  to  avoid  confusion  over  notation,  let  5,,  blf 
and  52  represent  the  secular,  long  period,  and  short  period 
variations,    respectively. 
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Secular  terms 


5.i=.|y'2ii  (-1+302) 

+  JLyVn  [  -15  +  16T|+25Tl2+  (30-9611-90'n2)  02  +  (105  +  14T1+25T|2)  04] 

+  _i£y/4T|e//2  <3-3O02+3504) 
16  (2.10) 


+  J^y'22[-35+2AT\+25T[2  +  (90-192t|-12  6T|2)  02  +  (385+360-H+45T12)  94] 


(2.11) 


(2.12) 


+  _^Y/4[21-9-n2+(-270+12  6ri2)e2+(385-18  9T|2)e4] 
1 6 

5,h=-3y/20  +  —  y'2z[  (-5  +  12tj  +  9t|2)  9+  (-35-36t|-5t|2)  03] 
8 

+^YA{5-3T\2)   (3-702)0 
Long  period  terms 

1         8  [  2         '  (1-502) 

-JLX7Le"i]2[l-3Q2- *f*l ]}  cos2g" 

12  y;     ■  d-502)  y 

+(i.27lT|2sinJ//  (2.13) 

i     2 

■«•  JL  J[_£Tl2s:LnJ"  (4+3e//2)  [1-902-      249\    ])  sing7' 
^%Yl  (1-502)  y 

-^LlTie//2Ti2sinJ//[l-502-      169'     ]  axa3g" 
385y;  '  (1-502) 


e/;5  e 
_     e    oxe  (2    14) 

Ti2tanl" 
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1       '     8  '  2  1-592     12y'2 


57/4    [l-3e2-_8^l,]}sin2 


// 


1-502 


_jfsinJ^{yi  +  lY%(4+9e//2)  [1_902- 


240< 


64 


Jf2j_e//-n3sinJ//{l-5e2-_il51)cos3g 
384Y;2  1-582 


1-502 


]  Jcosg- 


// 


(2.15) 


8.g-{-X*  [  (2+e//2)  -11  (2+3e//2)  92.  40  (2^5e//2)  84 
iy         16  1-502 

_400e//296    +   5y'«      (2+e//2)  _3  (2+3e//2)  02 
(1-502)2        24YV 


8(2+5e//2)04      8Oe//20 


//2Q6 


1-502  (1-502)2 

+   x  w  /8inJ//-  e//Q2  ) 

Tf2  y  3{~^TT~    TIKI77' 
5y' 


]  )sin2g 


// 


lb  e"  sinJ" 

//2X  1    r-i  _QD2.     2404 


+e"sinl"  (26  +  9e//2)  ]  [1-902 


°y  5e//02sinJ//  (4+3e//2)  [3  + 


1-502 
1602 


400' 


8 


]  icosg 


// 


576y'; 


I-4(e"sinJ"  (3+2e//2)- 


1-502      (1-502)2 

,3®2  )  ri  -^Q2_    1604  , 


sin. 


+e//302sinI//[5  +   3202  + 


8O04 


1-502      (1-502) 


]  )cos3g 


// 


(2.16) 


5lh=e"20{-^  [ll+-!£*  +     200y 
1  8  1-502      (1-502)2 


r// 


-  5y/r  n*  16l2  *  4oe'  j^-v 

lly7;  1-502      (1-502)2 

..•"0,    _Y%      ,       ST7,       (4+3e"»)[l-9e'-.glgl] 
inT77    TIsTnT77  1-502 


4y;2    sin. 


15y'. 


sinJ"  (4+3e//2)  [3  + 


160: 


4O0< 


T  1-502      (1-502)2 


]  )cosg 


// 


35y's    ,r. 


516J'. 


0{. 


2sin 


1 [1-502-    169<  ] 

.nl"  1-502 


+sinJ//  [5  + 


800' 


320: 


1-502      (1-502)2 


]  )cos3g 


// 


(2.17) 
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Short  period  terms 


,//3 


62a=a"y'2[(-l+3e2)  (±jT~±) 

r        ^  (2.18) 

+3  (1-02)  1 cos  (2g/+2f/)  ] 


r/3 


62e=^i[(-l+3e2)  i^l-\) 

2e"  r'2     T|3 


+3(l-02)  (_f -—cos{2g'+2f")  ]  (2.19) 

r/3     -q* 


Ti2y'2 


jf.  (1-62)  [Se^cos  (2g'+f)  +e"cos  (2g/+3'f/)  ] 


2e 


Y; 
52J=-Li6sinJ//  [3cos  (2g'+2f"')  (2.20) 

+3e"cos  (2g'+f)  +e"cos  (2g'*3f)  ] 

T13V/  =>"2  s" 

•   »  2  o  / 1  ioo2\  /  a      *,2 .  a 


62l  =  -_L_Li{2(-l+302)  (l_T1MT+l)Sinf 

r' 


4e//  _/2 


+3(l-e2)  [l-±!l!'n2  +  ±ll)sin(2g/+£/)  (2.21) 

r/2  r7 

,//2 


+  (±_^Tl2  +  I)sin(2g/+3£/)]} 


1 2-,// 


52g=_LL£{2(-l+3e2)  [<  *      ti*+*     +i)sinf 

4e"  r'  r' 

+  3(l-02)  [(l-±lll^  +  ±l)3in(2g/+f/) 

r'  r 


+  {±lllr]^±l  +  hsin(2g'+3f/)])  <2-22> 

r'  r'       3 

Y; 

+  _Li{6(-l+5e2)  (f'-I'+e^sinf) 
4 

+  (3-562)  (3sin(2g'+2f) 

+3e/,sin{2g'+f)  +e//sin  (2g;  +3f ') ) 
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y' 
52h  =  -_0{6(f-/-l/+e//sinf-/)  -  (3sin  (2g'  +2f)  (2.23) 

+3e//sin(2g/  +f)  +e//sin  (2g'  +3f) ) 

Using  Equations  2.10  through  2.23,  Brouwer's  algorithm 
for  predicting  a  satellite's  state  vector  is  as  follows: 

•  Begin  with  mean  elements  a",  e",  I",  10",  g0" ,  and  h0"  . 

•  Form  mean  motion  n0 . 


n^sfc^5  (2.24) 


Propagate  "mean"  mean  anomaly  1",  mean  argument  of  perigee 
g",  and  mean  ascending  node  h"  using  Equations  2.10  - 
2.12. 

I^V+notCL+a,!) 

g"=9Q"+n0tbsg  (2.25) 

h"=h0"+n0tb8h 


Apply  long  periodic  corrections  to  1",  g" ,  and  h"  using 
Equations  2.15  -  2.17  and  compute  the  long  period 
variations  5xe  and  ^I  using  Equations  2.13  and  2.14. 

g'=g"+bxg  (2.26) 

h'-h'^byh 


•  Solve  Kepler's  Equation  for  the  eccentric  anomaly  E'  , 
using  1'  and  e"  and  compute  the  true  anomaly,  f  and 
radius,  r' . 

E'-e"s±nE'=l" 


tan—  =, 


(l+e")tan*' 


2~\     (i-e")     2  (2-27) 


r,=  a"(l-e2) 


l+e;/cosf ' 


Apply  short  period  variations  to  1' ,    g' ,    h' ,  a",  e",  and 
I"  using  Equations  2.18  -  2.23. 
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I=l'+62i 

g=g'+62g 

h=h'+62h  {2  28) 

a-a"+o2a 

e=e//+61e+82e 

J=J//+61J+52J 


•    Solve   Kepler's   Equation    for   E,    using   e   and   1    and   compute 
f   and   r . 


E-es±nE=l 


f 


(1+e)  -,~  E 


r_  a(l-e2) 
l+ecosf 


•  Compute  the  position  vector  in  the  conventional  manner. 

x-r [cos  ig+f) cos^-sin (g+f) sin^cosJ] 

y-i [cos (g+f ) sinh+sin (g+f ) cos^cosJ]       (2 . 30) 

z=rsin (g+f ) sinl 


In  addition  to  accounting  for  perturbations  only  due 
the  Earth' s  oblateness  and  asymmetry  about  the  equatorial 
axis,  this  model  has  several  shortcomings  which  Brouwer 
addressed  in  his  article.  The  first  is  a  singularity  at 
critical  inclination  (Ic=cos-1  (1/V5)  =63  .  4°)  for  the  long  period 
corrections  since  many  of  the  terms  have  a  divisor  of 
(5cos2I-l)  .  Second  is  a  singularity  for  very  small 
eccentricities  (near  circular  orbits) .  This  singularity  is 
due  to  the  appearance  of  e"  as  a  divisor  in  the  short  period 
terms.  Finally,  there  exists  a  singularity  in  some  of  the 
elements  for  very  small  inclinations  (orbits  lying  in  the 
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equatorial  plane) .  Brouwer  suggested  that  although 
singularities  in  some  of  the  elements  existed  for  very  small 
eccentricities  and  inclinations,  no  such  singularity  existed 
in  the  coordinates.  Hence,  the  formulas  could  be  modified  and 
expressions  obtained  for  the  perturbations  in  coordinates  for 
these  special  cases. (Brouwer, 1959,  p.  393) 
2.   Lydanne's  Modifications 

Lydanne's  modifications  correct  for  the  singularities 
in  Brouwer' s  model  due  to  the  very  small  eccentricities  and 
inclinations.  He  presented  the  suggested  modifications  in  an 
article  in  the  Astronomical    Journal    in  1963. 

As  Brouwer  suggested  in  his  article,  Lydanne  and 
several  other  investigators  believed  there  existed  well- 
determined  expressions  for  the  coordinates  of  a  satellite  in 
the  case  of  either  very  small  eccentricity  or  inclination, 
because  no  singularity  actually  existed  in  the  coordinates  for 
the  small  eccentricity  or  inclination.  However,  Lydanne 
encountered  difficulty  in  applying  the  approach  suggested  by 
Brouwer.  This  approach  requires  the  Taylor  series  expansion 
of  the  coordinates  in  the  element  perturbation.  Although  the 
first-order  terms  were  regular,  the  higher  order  terms  were 
singular.  (Lyddane,  1963,  p.  555) 

Lyddane  suggested  another  approach.  By  formulating 
the  perturbation  theory  in  terms  of  Poincare'  variables 
instead  of  Delaunay  variables,   the  singularities  can  be 
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avoided.      The  Poincare'    variables   are   defined  in  the   terms   of 
the    Delaunay   variables    as    follows : 

xx=L  yx  =  l+g+h 

x2=^2  {L-G)  cos  (gr+A)    y2  =  -^2  {L-G)  sin  (flr+A)  (2.31) 

x2=yj  {G-H)  cosh  y^-y/ZTcTHTsinh 

Using  a  method  similar  to  Brouwer' s  method  of 
solution,  Lyddane  presented  with  some  algebraic  manipulation 
the  following  formulas : 

a=a//H-5a  ^  32) 

l+g+h=l"+g"+h"+b  (I+g+/i) 

ecosl=  (e;/+5e)  cosl" -e^Slsinl"  ^2  33) 

esinl=  (e;/+5e)  sinl" +e//5lcosl// 


J  J"       I"     5j 

sin  (_)  cosh=  [sin  (_ __)  +cos  (_ __)  ___]  cosh" 

-sin(J_)  5/Jsinh'7 


sin(-£)sinh=[sin(_^.)  +cos  (i-)  -L^lsinh" 

+sin  ( )  5hcosh// 

2 


(2.34) 


where 


5  =5X  +o2 

Additionally,  Lydanne  discovered  that  the  use  of  1"  instead  of 
1'  in  Equations  2.21  and  2.22  introduces  an  error  of  at  most 
order  0 (k22) .  Since  Brouwer  model  computes  long  period  terms 
to  only  to  order  0  (k2)  ,  Lyddane  suggested  that  the  short 
period  corrections  may  be  computed  using  1"  instead  of  1' 
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(Lydanne,  1963,  p.  557) .   Lyddane  claimed  this  approach  would 

overcome  the  e"  =  0  and  I"  =  0  singularities  in  the  periodic 

variations.  His  approach  removed  the  singularity  e"=0  for  all 

terms  and  the  singularity  at  I"=0  for  all  terms  except  §XI 

(Equation  2.14)  (Solomon,  1991,  p.  8). 

Lyddane' s  algorithm  for  propagating  an  element  set  is 
as  follows : 


•  Compute  1",  h",  5e,  e"5l,  and  5 1  using  Brouwer's  formulas 
(Equations  2.10  -  2.23)  .    To  avoid  a  small  difference 
between  two  large  quantities,  use  the  following  identities 
in  Equation  2.19  to  compute  5e . 

1        a  II      3  1  a  I' 

(^-/)[(^7/)  -Ti-3]=-\(e//Ti  +  -^- 
e"  r"  -q6      1+T| 

+  3cosf//+3e//cos2f//+e//2cos3f//)         „   m 

(2.35) 

(-%)  [(^-V-Tl-4]=^-(e//+3cosf// 
e"  r"  r\s 

+3e//cos2f//+e//2cos3f//) 


•  Compute  a  and  1+g+h  using  Equation  Set  2 . 32 .  To  reduce 
amount  of  error  introduced  by  finite  precision,  combine 
Equations  2.15  -  2.17  for  5X  (1+g+h)  and  Equations  2.21  - 
2.23  for  52  (1+g+h)  . 

•  Compute  sin (^1") 5h. 

sin  ( —  )  on- 


Tii  (2.36) 

2cos  — 
2 


•  Solve  for  e,  I,  1,  and  h  using  Equation  Sets  2.33  and 
2.34. 

•  Compute  coordinates  and  velocity  components  in  the  usual 
manner . 


19 


r= 


coshcos {g+f ) -sinhsin {g+f) cosl 

sinhcos (g+f) +coshsin {g+f)  cos J 

sin {g+f) sinX 


-coshsin{g+f) -sinhcos {g+f)  cosl 
v=  -sinhsin  {g+f)  +coshcos  {g+f)  cosl 
cos {g+f) sinl 

r^rr 

-»_  [  (esinf)  r+  (1+ecosf)  v] 


(2.37) 


l\{a 


3.   NAVSPASUR  Modifications 
a.  Atmospheric  Drag 

Brouwer' s  model  considers  only  the  perturbations 
due  to  zonal  harmonics  (Earth's  oblateness  and  asymmetry  about 
the  equatorial  axis) .  For  near-earth  orbits,  the  magnitude  of 
the  perturbative  acceleration  due  to  atmospheric  drag  can  be 
of  the  same  order  as  the  magnitude  of  the  perturbative 
acceleration  due  of  the  earth's  oblateness  (Knowles,  1992,  p. 
226) .  Figure  2.2  shows  the  relative  orders  of  magnitude  of 
the  perturbative  accelerations  at  various  altitudes  above  the 
Earth.  Fluctuations  in  the  density  and  relative  velocity  of 
the  atmosphere  to  the  satellite  make  the  perturbative 
acceleration  due  to  atmospheric  drag  difficult  to  model .  To 
compensate  for  the  drag  perturbation,  the  NAVSPASUR  model 
utilizes  a  simplified  sub-model  for  drag  (Solomon,  1991,  pp. 
9  -  10) .  Atmospheric  drag  is  modeled  by  time  derivatives  of 
the  "mean"  mean  anomaly,  1",  using  two  parameters  M2  and  M3 : 
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10M     — 


10*2     — 


10*-2      — 


1QA-4       — 


1 


10  a-6     _ 


.2c 


io*-a   _ 


10*-10    — 


10-M2 


note:  typical  satellite 

characteristics 

assumed  for  effects  such  as 

drag,  radiation  pressure,  etc. 


^onopo/e 


thermal  emission 


indirect  oblation 


100  n.m. 


1000  rum. 


10,000  n.m. 


Figure  2.2  Perturbative  Accelerations  on  Satellites 
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l"=l"a+mt+M!tt2+M3t3  (2.38) 

where  t  is  the  elapsed  time  from  epoch.  Using  the  following 
relationships , 

/n=n0(l+5sI)*n0 
and 

it    can  be    assumed 

a//3«/7?-2  (2.39) 

Differentiating  Equation   2.39    and  working   to    first    order, 

.__  2    in 
a~    -3—  (2.40) 

Using  m=2M2  from  Equation  2.38  and  substituting  into  Equation 
2.40  yields  following  model  for  the  drag  effect  on  the  semi- 
major  axis  : 


«i=- 


// 


**-M2  (2.41) 


3/n 

Assuming   a   spherically   symmetric   atmosphere   with   an 
exponential  density  variation 

p(r)=p0exp(-£^)  (2.42) 

H 

where  p0  is  the  density  at  radius  r=r0  and  H  is  the  scale 

height,   the   rates   of   change   in   the   semi-axis,   a,   and 
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eccentricity,  e,  with  respect  to  the  eccentric  anomaly,  E,  may 
be  expressed  as  (Danby,  1988,  pp.  330  -  331) : 


||=-a26p0exp(^cos£)^ 

de        _  *  _    _  ,  Aae 


1+eco^(l+ecosE) 


1-ecosE  (2.43) 


l+ecosE 


dE        '  K°     H  N  1-ecosE 

where  5  and  A  are  constants.  Estimating  the  average  change  in 
a  and  e  by  integrating  Equation  Set  2.43  over  one  orbit  to 
lowest  order  of  e  yields : 

Ae=_en!0  (2.44) 

Aa   a 

Assuming  a=l  and  substituting  Equation  2.41  into  2.44,  the 
model  for  the  decay  in  eccentricity  is : 

6=e'ya=_4eV  (2  45) 

a"  3/n   2 

b.       Near  Critical   Inclination 

Neither  Brouwer  nor  Lyddane  were  able  to  correct 
for  the  singularity  in  the  long  period  terms  at  the  critical 
inclination  (Ic=cos_1  (l/v5)  =63  . 4° )  .  To  prevent  an  overflow 
error  in  the  subroutine  PPT2,  the  factor  (l-5cos2I") _1  is 
approximated  by  the  function 

T2_  l-exp(-100x2)  (2.46) 

x 

where  x=l-5cos2I".  However,  with  a  small  value  of  x,  T2 
cannot  be  computed  directly.  By  using  a  product  expansion  of 
Equation  2.46, 
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10 

E 

n=0 


T2=-  (l-exp(-Px2)  )  J  [  (l+exp(-2nPx2)  )       (2.47) 


where   P=100/211.     To   remove  the   factor  of  x  from  the 
denominator,  the  first  factor  is  approximated  by 


(l-exp(-px2))  =p  ^  ,   )n  &"x2a  (2<48) 

x  v    fa  (22+1)  ! 


Figure  2 . 3  shows  a  comparison  of  T2  using  Equations  2 . 47  and 
2.48  and  1/x  in  vicinity  of  the  critical  inclination. 
(Solomon,  1991,  pp.  10  -  11) 

C.   PPT2 

PPT2  is  the  Fortran  subroutine  which  implements  Brouwer's 
model  with  Lyddane's  and  NAVSPASUR' s  modifications.  PPT2 
completes  several  satellite  propagation  tasks. 

•  Predicts  a  satellite  state  vector  at  future  time. 

•  Computes  partial  derivatives  of  the  position  vector  with 
respect  to  the  orbital  elements  (used  in  Method  of 
Differential  Correction  to  modify  set  of  stored  elements 
in  light  of  current  observations) . 

•  Predicts  the  time  and  state  vector  of  satellite  for  a 
given  true  anomaly. 

PPT2  is  composed  of  sections  which  accomplish  the  tasks 

named  above.    The  sections  are  delineated  by  conditional 

breakpoints .   Control  over  which  section  to  execute  is  handled 

by  a  set  of  control  variables .    Data  is  passed  to  the 

subroutine  through  three  control  variables  and  four  Common 

blocks.  A  complete  description  of  the  input/output  of  PPT2  is 
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Figure  2.3  Near  Critical  Inclination 


contained  in  Appendix  A. 

NAVSPA3UR  represents  each  tracked  object  by  a  set  of 
stored  elements .  The  stored  elements  are  the  mean  orbital 
elements  plus  two  drag  parameters,  M2  and  M3 .  The  stored 
orbital  elements  are  the  same  as  the  classical  orbital 
elements  used  in  Brouwer's  model  with  two  exceptions.  The 
mean  semi-major  axis,  a",  is  replaced  by  the  variable,  m,  the 
mean  "mean"  motion. 


25 


m=  (1+5,1)  n0  =  (1+5,2)  a""273  (2.49) 

The  other  exception  is  that  the  cosine  of  the  mean 
inclination,  cos  I",  is  stored  instead  of  the  mean 
inclination,  I".  Thus  the  stored  elements  are:  1",  m,  M2,  M3, 
e",  g",  h",  and  cos  I". 

For  NAVSPASUR  to  use  the  formulas  from  Brouwer' s  model, 
the  mean  semi-major  axis,  a",  must  be  determined  from  the  mean 
"mean"  motion,  m.  Since  a"  is  implied  in  the  5,1  (Equation 
2,10),  no  direct  solution  is  possible.  Therefore,  PPT2 
recovers  a"  using  an  initial  approximation  for  a"  and  solving 
Equation  2.48  for  a"  iteratively.  The  initial  guess  for  a"  is 
a0"=m~*.   Then,  for  i=l,  .  .  .  ,  5 

(5.D ,=4^ (-1+30*)  (2  50) 

+-^Y/2iTU-15  +  167l+25Ti2 

+  (30-96TJ-90T12)  02+  (105+14T|+25lf)  94] 

+  15v/..T>e//2 


T^Y/4iTle/  (3-3002+3504) 
1 6 


m 
and  the  mean  semi-major  axis,  a"0,  is  taken  to  be  equal  to  a5" 
(Solomon,  1991,  p.  15) . 

Another   difference   between   the   model   and   actual 
computations  by  PPT2   is  the  computation  of  the  secular 


26 


corrections  for  the  mean  argument  of  perigee,  g",  and  mean 
ascending  node,  h"  .  The  secular  corrections  for  g"  and  h"  are 
computed  in  terms  of  mean  anomaly,  1",  instead  of  time  using 
5.g  and  5.h  from  Equations  2.11  and  2.12. 


dg"  dg" 

dg"  _  dt  dt       n0bag _  bMg 

dl"       dl"  m~~    m~     ma"3/2 

"dT 


(2.51) 


//      ,-!»,// 


dh"       dh 


dh"  _ 

dt 
dl" 

dt 
m 

_n0b3h_      bah 

dl" 

m         ma"3/2 

(2.52) 


dt 


Once  the  secular  corrections  to  the  "mean"  mean  anomaly  are 
computed  via  Equation  2.38,  Equations  2.51  and  2.52  are  used 
to  correct  g"  and  h" . 

g"-g0"+f£Al 

wi  <2-53> 

0  dl 

where 

Al=mt+M2t2+Af3t3 

With  all  of  its  conditional  breakpoints,  PPT2  completes 
only  those  tasks  required  by  the  user.  Prior  to  completing 
any  of  the  fore-mentioned  tasks,  the  user  is  required  to  make 
an  initial  call  to  subroutine  PPT2  to  recover  a"  and  compute 
the  secular  corrections.  During  this  initial  call,  many 
variables  are  set  which  will  be  used  in  subsequent  calls, 
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increasing  efficiency.   Therefore,  at  least  two  calls  to  PPT2 
are  required  to  complete  any  of  its  tasks.3 

Because  the  main  objective  of  this  thesis  is  the 
parallelization  of  the  satellite  state  vector  prediction  task 
of  PPT2,  only  this  algorithm  will  be  presented.  For  a 
complete  description  of  the  other  tasks  see  (Solomon,  1991) .4 
The  algorithm  implementing  the  NAVSPASUR  model  is  as  follows: 

•  Begin  with  the   stored  mean  elements  plus  the  drag 
correction  terms  (10",  m,  M2,  M3,  e0",  g0" ,  h0",  and  cos  I") 

•  Cdmpute  T2  using  Equations  2.47  and  2.48. 

•  Recover  mean  semi-major  axis,  a0",  from  the  mean  motion, 
m,  by  iteration  using  Equation  Set  2.50. 

•  Compute  the  following  dimensionless  quantities: 

Y  =  *2   v  =  ^3  °   v  =  k*        y   =  A5.o 

'2   a//2     f3   a,/3     14   a„4     15   a,l5  (2.54) 

Y/2=Y2'n"4  Y^YaT6  Y/4=Y4TI"8  Y/5=Y5T1"10 


Compute  drag  corrections  for  the  semi-major  axis,  a",  and 
eccentricity,  e",  using  Equations  2.41  and  2.45. 

Using  Equation  2.38  and  Equation  Set  2.53,  propagate  the 
mean  anomaly,  1",  argument  of  perigee,  g",  and  the 
ascending  node,  h",  considering  only  the  secular 
corrections.  (h"  may  be  optionally  corrected  for  the 
Earth's  rotation  using  Equation  2.56,  where  CO  is  the 
Earth' s  angular  velocity  and  Tc  is  the  time  at  which  the 
direction  of  the  Greenwich  meridian  and  equinox  direction 
coincides . 


3For  a  complete  description  of  calling  options  of  PPT2, 
see  (Solomon,  1991,  pp.  11  -  24)  . 

4A  complete  listing  of  subroutine  PPT2  is  contained  in 
(Solomon,  1991,  pp.  39  -  55)  . 
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2"=I0"+AI 
„//  _„  //+  dg 


g»=g0».-Al  (2.55) 


dl 
0       dl 


h"=h"-<£i{t-Tc)      [optional]  (2.56) 


•   Propagate    the   eccentricity,-    e ",    and   the   semi-major   axis, 
a". 


e"=min[max[0,  e0"+et]  r  .  99999] 
a"=max[l,  a0"+at] 


0       w„j ,  . „.,„.,...,  (2.57) 


•  Solve  Kepler's  Equation  using  Steffensen's  Method  and  use 
this  result  to  compute  cosine  and  sine  of  the  true  anomaly 
and   radius . 


E"=I"+e"  sins' 


cosf"=^±l 
l-e"cosE" 


sin£"= 


,,_  \Jl-e"2   sinE 


(2.58) 


// 


l-e"cosE' 

I f-rtZ 


r//=  a"T] 


ii 


1+e" +cosf 


•  Using  Equations  2.13-2.17,  compute  long  period  correction 
terms  for  e,  lr  h,  and  I  in  the  following  forms  replacing 
g'    and    (1-502)"1   with   g"    and   T2,    respectively: 

51e  =  VZ,.Elcos2g//  +VL£J2sing//  +  VZ,.E3sin3g// 
e//51l^(VL.EIsin2g//-VLL2cosg//-VZ,£;3cos3g//) 
sin  I"  bxh  =  VLHl  Isin2g"  ^VLH2Icosg"  +VLH3Icos3g/f     (2.59) 

6,1-- L_ 

1         T[2tanl" 


where 
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i  5y7 

VL£l=e//T|2  (±Y/,[l-H02-40e4-  T2]  - LJ-  [  1 -302-804-  T2] 

8  '  2  12YS 

VLE2=V?SxnI"  (y'  +  5y  5  (4+3e//2)  [l-902-2404-  T2]  ) 

Ay'  2  16 

35y; 

VLE3-- LLe//2r\za±nIn  [l-502-1604-  T2] 

384^2 

15y' 

VLL2=VI,£2  + L_!e//2T|2sinJ//  [l-902-2404-  T2] 

32Y'2 


ij'M-Xifll+i 
5Y;4 


VLHU=e//20sinJ//  (  -  JL£  [11  +8O02-  T2+2OO04-  T22] 

8 


[3  +  1602-  T2  +  4O04-  T22]) 


12y'a 

VLH2J=  e"9WV     Ts  (4+3e//2)  [l-902-2404-  T2] 
4y'2  16 

15Y7  sin2J// 

(4+3e//2)  [3+1602-  T2  +  4O04-  T22] 


8 

35y'  i 

VLH3I=- Li_e//30{jl[l-502-1604-  T2] 

576Y72  2 

+  sin2J/;  [5+3202-  T2  +  8O04-  T22]) 


•  Representing  the  quantity  (1+g+h)  by  z,  compute 
51z=51l+51g+51h  by  combining  individual  parts  prior  to 
computing. 

51z  =  VL51sin2g//  +VLS2cosg"  +VLS3cos3g/f  (2  .  60) 

where 

VLS1=   (T|  "1)   [y'    (1-1102-4O04-  T2)  - 1-1  (l-302-804-  12)  ] 

+  ±^J?[Y/2(H+8O02-  T2+2OO04-  T22)  -y1 2(3+ie$2-  T2  +  4O04-  T2)  ] 
8 

+25e//206-  T2'(y'2-  —  )  -  e''    [Y/2(l-3302-2OO04-  T2) 

5  16 

-Y/3d-902-4O04-  T2)  ] 
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- 1 

7i[3(e//2-T|3  +  ll]  [l-902-2404-  T2] 


VLS2=e',sinI,,{     X     [  (Tl  +  _L_)  +     0 ..  ]  [Y/,+-- Lf  (4+3e//2)  (l-902-2404-  T2)  ] 
4Y',  1+T|         1+0  J       16 


10Y/5r././/2 


64y7 

f  5.9  (1-8)  [  (4+3e//2)  (3  +  1692-  T2+4O04'  T22)  ] 


32Y': 


VLS3=e"sinl"{    35Y  5  [3T|2-3-(2  +  9..)e//2]  [l-502-1604-  T2] 
1152v'  TT^ 


35^5  r.,/2 


576Y'; 


1152Y' 

[e//20(l-0)  (5+4O02-  T2  +  8O04-  T22)  ] 


Using  Equation  2.19  and  Equation  Set  2.35,  compute  the 
short  period  correction  term  for  e,  replacing  g'  and  f 
with  g"  and  f",  respectively.  Then  combine  long  and  short 
period  corrections  to  form  5e . 


52e=J_£{(-l+302)  [e//2cos3f//+3e/'cos2:f/' 


+3cosf-"+e"(7l+     *    )]  {2S1) 


1+T| 

+  3<l-02)  [e//2cos3f"+3e"cos2f// 

+  3cosf//+e"cos  {2g"  +2f")  ] 

-T|2(l-02)  [3e"cos  (2g"+f")  +e"cos  (2gr"  +3f;/ ) } 

be=bxe+b2e  (2.62) 


•  Using  Equations  2.20  and  2.21,  compute  the  short  period 
correction  terms  for  I  and  1  in  the  following  form 
replacing  g'  and  f  with  g"  and  f",  respectively.  Then 
combine  respective  long  and  short  period  terms  to  form  5l 
and  e5l. 
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52J=-LiesinJ//  [3cos(2g"+2£")  +  3e;/cos  {2g"  +fn  ) 


2 
+e"cos(2g"+3f")  ] 


e      v/2 


5,i=-^_li{2(-i+3e2) 


4 
v[   (l+e^cos£^)(2+e//cos^/),.,]oV^//  (2    63) 

l[{,_  il+e"cost")  l2+e"cosf")  ,..„, 2gn+f,f) 
+  (   (1+e^cosf^)  (2+e^cosf")  +  1} 
x  sin(2g"+3f")  ]} 


e5l=eo1l+eo2I 

•    Using   equation   2.23,    compute    short   period   correction    for 
h   in   the    following   form. 

sinJ//52h=-2isinJ//e[6(f//-l//e//sinf//) 

2  (2.65) 

-3sin(2g"+2£")  +3e"sin  (2g"  +f " ) 
+e"sin(2g"+3£")  ] 


•    Using    Equation    2.36    and    relationship    5h=51h+S2h,     compute 
sin (HI") 5h. 


sin(4_)5h=  j  (2.66) 


2cos  (__) 
2 


•  Combine  terms  of  52z=52l+52g+52h,  replacing  g'  ,  1'  ,    and  f ' 
with  g",  1",  f",    respectively.   Then  compute  z. 

52z=-e//252l iI23 

2  2  Tl3 

v'  (2.67) 

-JLi  [6  (1-20-502)  (f^-e^inf"-!") 
4 

-(3+2O-502)  (3sin(2g"+2f") 

+  3e"sin(2g"+2r"")  +e"sin  (2g"  +3f"  )  ] 
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z  =  l//+g//+h//+51z  +  62z  (2.68) 

•    Compute    a   using  Equation   2.18. 

a  =  a"+d2a  (2.69) 


•  Solve  Equation  Sets  2.33  and  2.34  explicitly  for  e,  1,  I, 
and  h. 

e=\/(5e)2+(e5l)2 

.  , -i  ,  5esinl/7  +e5lcosl//  v 

l=tan  ( -_ , _) 

becosln-ebls±nl" 
cos J=l -2  [  (5 J)  2  +  (sin  (.£)  bh) 2]  (2  .  70) 

Sls±nh"+sin{*)  Shcosh" 
h=tan_1  ( Z ) 


5Jcosh"-sin(.£)  Shsinh" 


•  Compute  g . 

g^z-l-h  (2.71) 


•  Solve  Kepler's  Equation  again  using  Steffensen's  Method 
and  compute  cos  f,  sin  f,  and  r. 


E= 

2+esinE 

cosf= 

cosE-e 

1-ecosE 

sinf= 

yl-e2   sinE 

1-ecosE 

r- 

ar\2 

1+e+cosf 

(2.72) 


Compute  the  satellite  state  vector  using  Equation  set 
2.37. 
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III.  PARALLEL  COMPUTING 

A .   OVERVIEW 

1 .   Definition 

The  complexity  of  scientific  computing  today  demands 
faster  computers.  Greater  detailed  models  require  a 
substantial  amount  of  computation.  Faster  computers  are 
needed  to  provide  the  results  of  the  computation  in  a  timely 
manner.  In  response  to  this  demand,  computer  engineers  have 
taken  two  approaches  to  achieve  faster  performance. 

The  first  approach  is  to  increase  the  speed  of  the 
circuitry.  Although  great  advances  have  been  made  in 
increasing  the  speed  of  computer  circuitry,  this  increase  in 
speed  is  bounded  by  the  speed  of  light.  Additionally,  the 
specific  design  and  manufacture  requirements  for  further 
increases  in  speed  are  quite  costly. 

The  second  approach,  parallel  computing,  provides  an 
alternate  means  to  achieve  faster  computer  performance  using 
affordable  circuitry  design.  Many  articles  and  books  have 
been  written  describing  the  methods  to  exploit  this  approach. 
The  terms  parallel  computing  and  parallel  processing  seem  to 
be  used  interchangeably  in  these  texts.  For  the  purpose  of 
this  thesis,  parallel  computing  and  parallel  processing  are 
assumed  to  be  synonymous .    In  this  emerging  field,  there 
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exists  slight  differences  in  how  to  define  parallel  computing. 

One  definition  which  best  encompasses  the  breadth  of  the  field 

may  be  taken  from  (Hwang,  1984,  p.  6) : 

Parallel  processing  is  an  efficient  form  of  information 
processing  which  emphasizes  the  exploitation  of  concurrent 
events  in  the  computing  process .  Concurrency  implies 
parallelism,  simultaneity,  and  pipelining.  Parallel 
events  may  occur  in  multiple  resources  during  the  same 
time  interval;  simultaneous  events  may  occur  at  the  same 
time  instant;  and  pipelined  events  may  occur  in  overlapped 
time  spans .  These  concurrent  events  are  attainable  in  a 
computer  system  at  various  processing  levels . 

In  his  book,  Hwang  describes  the  processing  levels  to 
be:  the  program  level,  the  task  level,  the  inter-instruction 
level,  and  the  intra-instruction  level.  The  program  level 
involves  executing  multiple  programs  by  means  of 
multiprogramming,  time  sharing,  and  multiprocessing.  This 
level  is  concerned  with  the  design  of  parallel  processing 
systems  which  is  beyond  the  scope  of  this  thesis.  Therefore, 
for  the  purposes  of  this  paper,  the  definition  of  parallel 
computing  is  defined  as  the  efficient  form  of  information 
processing  emphasizing  the  concurrent  computations  and 
manipulation  of  data  to  solve  a  single  problem. 
2 .   Classification  of  Parallel  Computers 

a.  Type   Classifications 

Implicit  in  the  definition  of  parallel  computing 
are  three  methods  to  achieve  parallelism.  The  three  methods 
are  temporal  parallelism,  spatial  parallelism,  and 
asynchronous  parallelism  (Hwang,  1984,  p.  20)  .   These  methods 
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offer  a  manner  to  classify  the  various  types  of  parallel 
computers . 

The  first  type  is  a  pipeline  computer.  Pipeline 
computers  perform  overlapped  computations  to  exploit  temporal 
parallelism.  Computations  are  divided  into  a  number  of  stages 
or  segments  with  the  output  of  one  segment  being  the  input  of 
another.  Analogous  to  a  factory  assembly  line,  if  each 
segment  works  at  same  speed,  the  work  rate  of  the  pipeline  is 
the  sum  of  work  rates  of  the  segments .  The  maximum  work  rate 
is  achieved  once  the  pipeline  is  full.  An  example  of  a 
pipeline  computer  is  the  Cray— 1 . 

The  second  type  is  an  array  processor.  Array- 
processors  use  multiple  synchronized  processing  elements,  to 
achieve  spatial  parallelism.  Each  processing  element  performs 
simultaneously  identical  operations  on  different  data.  An 
example  of  an  array  processor  is  the  Connection  Machine. 

The  third  type  is  a  multiprocessor. 
Multiprocessors  may  achieve  asynchronous  parallelism  through 
a  set  of  interactive  processors  (nodes) .  These  processors  are 
capable  of  performing  independent  operations,  but  share 
resources  such  as  memory.  An  example  of  a  multiprocessor  is 
the  Cm*  of  Carnegie-Mellon  University. 

The  final  type  is  a  refinement  of  the 
multiprocessor,  the  multicomputer.  Multi computers,  like 
multiprocessors,  achieve  asynchronous  parallelism  through  a 
set  of  interactive  processors.  But  these  processors  each  have 
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their  own  local  memory.  An  example  of  a  multicomputer  is  the 
INTEL  iPSC  hypercube .  Because  each  processor  has  its  own 
memory  and  may  perform  independent  operations,  multicomputers 
offer  the  user  an  added  degree  of  freedom  in  programming. 
However,  interaction  between  the  processors  (nodes)  may 
require  synchronization  to  be  explicitly  programmed  in  the 
multicomputer  code. 

The  four  type  classifications  are  not  necessarily 
mutually  exclusive.  Many  commercially  available  array 
processors,  multiprocessors,  and  multicomputers  employ 
pipeline  processors  to  complete  operations  such  as  vector 
processing. 

b.      Architectural   Classifications 

Parallel   computers   may   also   be   classified 

according  to  their  architecture.   One  scheme  for  classifying 

digital  computers  was  introduced  by  Michael  J.  Flynn  in  1966. 

He   introduced  a  scheme  to  classify  computers   into  four 

categories  based  on  the  multiplicity  of  instruction  and  data 

streams .   An  instruction   stream   is  a  sequence  of  instructions 

to  be  executed  by  the  computer.   Likewise,  a  data   stream   is  a 

sequence   of   data  used  by   the   computer.     Flynn' s   four 

categories  are  (Flynn,  1966) : 

1.  Single  instruction  stream,  single  data  stream  (SISD) . 
Most  serial  computers  fall  in  the  SISD  category. 
Although  instructions  are  completed  sequentially,  this 
category  includes  overlapping  instructions  (pipelining) . 
Therefore,  pure  pipeline  processors  also  belong  to  this 
category . 
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2.  Single  instruction  stream,  multiple  data  stream  (SIMD) . 
Array  processors  fall  into  this  category.  The  array 
processor  receives  a  single  set  of  instructions,  but  each 
element  receives  and  manipulates  its  own  set  of  data. 

3.  Multiple  instruction  stream,  single  data  stream  (MISD) . 
No  current  computers  fall  into  this  category.  This 
architecture  has  been  challenged  as  impractical  by  some 
computer  designers  (Hwang,  1984,  p.  34) . 

4.  Multiple  instruction  stream,  multiple  data  stream  (MIMD) . 
Most  multiprocessors  and  multicomputers  fall  into  this 
category.   The  INTEL  iPSC  is  a  MIMD  machine. 

c.       Topological   Classifications 

Another  classifying  scheme  for  parallel  computers 
is  by  the  topology  of  the  inter-processor  connections.  These 
connections  are  the  means  through  which  communication  between 
individual  processors  is  conducted.  This  classifying  scheme 
applies  only  to  array  processors,  multiprocessors,  and 
multicomputers.  Some  of  the  general  topologies  are  the  mesh, 
the  pyramid,  the  butterfly,  and  the  hypercube .  Figures  3.1 
and  3.2  show  examples  of  the  mesh  and  hypercube  topologies. 
The  topology  may  also  be  customized  to  meet  specific  computing 
needs.  For  a  more  comprehensive  discussion  of  the  various 
topologies  see  (Quinn,  1987,  pp.  25  -  30) . 
3 .   Measurements  of  Performance 

With  faster  computation  speed  being  the  ultimate 
objective,  certain  measures  are  needed  to  determine  the 
effectiveness  of  parallel  computing  versus  serial  computing  to 
achieve  this  objective.  Computation  speed  depends  on  many 
factors  that   include  the  computer  hardware  design,   the 
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Figure  3.1  Two-dimensional  meshes 

technical  specifications  of  its  components,  and  the  algorithm 
or  method  of  solution  used  to  complete  the  computations .  Two 
common  measures  of  effectiveness,  accounting  for  both  the 
hardware  and  the  algorithm,  are  speedup   and  efficiency. 

Speedup,     Sp,  refers  to  the  ratio  between  the  time 
taken  to  execute  a  set  of  computations  serially,  Ts,  and  the 
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Figure  3.2  Hypercubes  of  dimension  zero  through  four 


time  taken  to  complete  the  same  set  of  computations  exploiting 
parallelism,  Tp, 


(3.1) 
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Although  speedup  compares  the  time  taken  for  the  serial 
computer  program  and  the  parallel  computer  to  complete  the 
same  set  of  computations,  this  set  does  not  imply  both 
programs  follow  the  same  algorithm.  Parallel  programs  often 
contain  additional  operations  to  accommodate  parallelism.  In 
order  not  to  be  misleading,  speedup  should  compare  the 
parallel  computer  program  with  the  most  efficient  serial 
computer  program.  Many  suggest  that  times,  Tp  and  T,,  be 
measured  using  a  particular  parallel  computer  and  the  fastest 
serial  computer.  However,  the  variation  in  the  technical 
specifications  of  both  computers  may  cloud  the  issue  whether 
parallel  processing  is  beneficial.  To  be  an  effective 
measure,  the  computing  technical  specifications  of  the 
individual  processor  of  the  parallel  computer  and  the  serial 
computer  should  be  equal.  Therefore,  for  the  purpose  of  this 
thesis,  speedup,  Sp,  is  measured  by  the  ratio  of  time,  Tlr 
taken  by  the  parallel  computer  executing  the  most  efficient 
serial  algorithm  and  the  time,  Tp,  taken  by  the  same  parallel 
computer  executing  the  parallel  algorithm  using  p  processors . 

«,  -■£  <3-2> 

p 

The  other  measure,  efficiency,  accounts  for  the 
relative  cost  of  achieving  a  specific  speedup.  Relative  cost 
is  measured  as  the  number  of  processors  required  to  achieve 
the   speedup.    Efficiency,      Ep,   is  the  ratio  between  the 
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speedup,  Sp,  and  the  number  of  processors,  p  (the  theoretical 
speedup) . 

EB   =.£?  (3.3) 

p   P 

Many  factors  could  possibly  limit  the  possible  speedup 
and  efficiency  of  a  parallel  program.  These  factors  include 
the  number  of  sequential  operations  that  cannot  be 
parallelized,  the  communication  time  between  individual 
processors,  and  the  time  each  processor  is  idle  due  to 
synchronization  requirements.  Many  have  argued  these  factors 
severely  restrict  the  benefits  of  parallel  computing.  Despite 
these  factors,  research  has  shown  parallel  computing  can  be  an 
effective  means  to  reduce  computation  time  (see  Quinn,  1987, 
pp.  18  -  20  and  Gustafson,  1988)  .  Considering  only  the  number 
of  sequential  operations  in  a  program  that  cannot  be 
parallelized,  Amdal's  Law  states  that  the  maximum  speedup,  Sp, 
achievable  by  p  processors  is: 

S    < I (3.4) 

p      f+(l-f)/p 

where  f  is  the  fraction  of  operations  that  must  be  performed 
sequentially  (Amdahl,  1967,  pp.  483  -  485).  Equation  3.3 
provides  an  initial  means  to  determine  if  an  algorithm  is  a 
good  candidate  for  parallelization . 
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B.   INTEL  iPSC/2  HYPERCUBE 

To  maximize  speedup  and  efficiency,  parallel  algorithms 
must  be  developed  with  a  specific  parallel  computer  in  mind. 
In  determining  the  parallel  computing  potential  of  the 
NAVSPASUR  satellite  motion  model,  an  INTEL  iPSC/2  hypercube 
computer,  located  at  the  Department  of  Mathematics  at  the 
Naval  Postgraduate  School,  was  used.  The  iPSC/2  is  a  MIMD 
multicomputer  with  a  hypercube  topology.  The  iPSC/2  consists 
of  a  system  resource  manager  and  eight  individual  processors, 
called  computing  nodes.  The  system  resource  manager,  often 
called  the  host,  provides  the  interface  between  the  user  and 
the  computing  nodes.  The  host  is  a  386-based  computer,  which 
may  be  used  to  process  data  in  addition  to  providing  the 
interface  for  the  user. 

The  computing  nodes  are  complete,  self-contained  INTEL 

80386  microprocessors.   Each  computing  node  also  contains  a 

80387  numeric  coprocessor,  its  own  local  memory,  and  a  Direct- 
Connect  communications  module  (DCM) .  Each  computing  node  may 
be  augmented  by  a  Vector  Extension  (VX)  module  for  pipelined 
vector  operations.  The  iPSC/2  located  at  the  Naval 
Postgraduate  School  contains  only  one  node  with  the  VX  module . 

Communications  among  the  nodes  and  the  host  are  completed 
through  message  passing.    The  Direct-Connect  Module  (DCM) 
allows  messages  to  be  sent  directly  to  the  receiving  node 
without  disturbing  the  other  node  processors.  Other  hypercube 
designs  require  messages  to  be  stored  and  forwarded  along  a 
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path   of   connected  nodes   until   the   message   reached  the 
receiving  node . 

The  iPSC/2  uses  a  UNIX  operating  system  and  may  be 
programmed  in  Fortran  and  C  languages .  A  more  detailed 
listing  of  the  INTEL  iPSC/2  hypercube' s  technical 
specifications  is  contained  in  Appendix  B. 

C.   METHODS  OF  PARALLEL I ZAT I ON 
1 .   Vectorization 

Vectorization  is  one  method  to  parallelize  an  existing 
sequential  program.  Vectorization  is  the  process  of  converting 
blocks  of  sequential  operations  into  vector  instructions  that 
may  be  pipelined.  A  simple  example  of  vectorization  using 
Fortran  is  the  following: 
Sequential  Code: 

Do   10  i=l,N 
10  z(i)=x(i)+y(i) 

Vector  Code  (VAST2) : 

call  vadd{N, x, 1, y,lf  z, 1) 

To  assist  in  the  vectorization  of  a  serial  program,  there 
exist  many  commercially-available  vectorizing  compilers . 
(Quinn,  1987,  pp.  233  -  235) 

Vectorizing    compilers    automatically    vectorize 
sequential  program  code  for  execution.  Additionally,  they  may 
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identify  to  the  user  program  constructs  and  data  dependencies 
that  limit  potential  vectorization.  Vectorization  cannot  be 
maximized  solely  by  a  compiler.  Most  vectorizing  compilers 
have  a  limited  ability  in  recognizing  sequential  blocks  to  be 
vectorized  and  translations  may  not  be  always  straight 
forward.  INTEL  iPSC/2  contains  the  vectorizing  compiler, 
VAST2 .  The  VAST2  compiler  supports  only  Fortran  programs  and 
is  limited  to  vectorizing  only  do  loops  and  if  statements . 
(iPSC/2  VAST2  User's  Guide,  1989) 
2 .   Distributing  Computations 

By  parallelizing  tasks  on  individual  processors, 
Vectorization  provides  only  the  first  level  of  parallelism. 
In  order  to  partition  a  program  into  parallel  tasks  to 
distribute  among  the  processors  of  a  multi-computer,  a 
different  strategy  is  needed.  Although  there  exist  many 
commercially-available  vectorizing  compilers,  compilers  which 
identify  higher  levels  of  parallelism  have  not  been  as 
successful.  Therefore,  the  task  of  developing  an  algorithm  to 
efficiently  distribute  computations  among  several  processors 
is  left  to  the  user. 

Performance  of  parallel  algorithms  may  be  radically 
different  for  different  parallel  computers.    A  number  of 
factors  such  as  processor  speed,  memory  access  time,   and 
memory  capacity  can  affect  an  algorithm's  performance.  Hence, 
the  strategy  to  parallelize  an  algorithm  must  be  developed 
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with  a  specific  parallel  computer  in  mind.  The  multicomputer 
with  each  node  having  its  own  memory  provides  the  greatest 
flexibility  to  the  user.  For  a  multicomputer,  the  user  must 
partition  the  problem  among  the  processor  nodes .  The 
hypercube  topology  allows  the  user  to  use  the  natural  topology 
of  the  problem  to  decompose  the  problem  into  parallel 
processes .  A  process  is  defined  as  a  single  statement  or  a 
group  of  statements  which  are  a  self-contained  portion  of  the 
total  computations.  Using  the  INTEL  iPSC/2,  two  decomposition 
strategies  are  suggested  (iPSC/2  User's  Guide,  1990.  pp.  4-1  - 
4-6)  : 

•  Control  Decomposition 

•  Domain  Decomposition 

a.  Control   Decomposition 

Control  decomposition  is  the  strategy  of  dividing 
tasks  or  processes  among  the  individual  processors  (nodes) . 
This  strategy  incorporates  a  divide  and  conquer  approach. 
Control  decomposition  is  recommended  for  problems  with 
irregular  data  structures  or  unpredictable  control  flows . 

One  method  of  control  decomposition  is  for  the 
parallel  program  to  self-schedule  tasks.  For  this  method  one 
node  assumes  the  role  of  a  manager  with  the  remaining  nodes 
assuming  roles  as  workers.  The  managing  node  maintains  a  list 
of  processes  to  be  accomplished  and  assigns  a  processes  to  the 
working  nodes.    The  working  nodes  request   jobs,   receive 
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processes,  and  perform  the  indicated  tasks.  Implied  in  the 
self-scheduling  method  is  the  cost  of  one  processor  to  perform 
the  manager  duties.  (iPSC/2  User's  Guide,  1990,  p.  4-4) 

A  second  method  of  control  decomposition  is  to 
pre-schedule  the  processes .  The  exact  tasks  required  of  each 
node  are  explicitly  stated  in  the  parallel  program.  Although 
this  method  saves  the  cost  of  the  managing  node,  care  must  be 
taken  to  ensure  the  processes  are  evenly  distributed  among  the 
nodes . 

b.       Domain  Decomposition 

Domain  decomposition  is  the  strategy  of  dividing 
the  input  data  or  domain  among  the  nodes .  The  partitioned 
sets  of  domain  may  be  specific  data  sets  such  as  blocks  of 
matrix  or  represent  a  specific  grid  such  as  used  in  finite 
difference  or  finite  element  methods  to  solve  partial 
differential  equations.  The  major  difference  between  control 
and  domain  composition  is  that  domain  decomposition  strategy 
requires  each  node  to  perform  essentially  the  same  tasks  but 
with  different  input  data. 

Domain  decomposition  is  recommended  if  the 
calculations  are  based  on  a  large  data  structure  and  the 
amount  of  work  is  the  same  for  each  node.  An  example  of 
domain  decomposition  is  multiplying  two  large  matrices  by 
block  multiplication.  Although  domain  decomposition  may  seem 
perfectly  parallelizable  and  thereby  very  efficient,  user  must 
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use  caution  to  ensure  each  input  data  set  requires  essentially 
the  same  amount  of  work. 

3 .   Improving  Performance 

The  decomposition  of  a  problem  may  require  the  use  of 
the  control,  domain  or  a  hybrid  of  both  strategies  to  be 
efficient.  Once  a  specific  strategy  is  chosen,  several 
factors  should  be  considered  to  improve  the  performance  of  the 
parallel  algorithm.   Those  factors  include: 

•  Load  balance 

•  Communication  to  computation  ratio 

•  Sequential  bottlenecks 

a.  Load  Balance 

Load  balance  refers  to  the  degree  to  which  all 
nodes  are  active.  If  the  work  is  not  evenly  distributed  among 
the  nodes,  the  parallel  algorithm  will  show  constrained 
speedup.  Load  balancing  may  be  achieved  by  decreasing  the 
grain  size  of  the  parallel  tasks,  self-scheduling  tasks,  or 
redistributing  the  domain.  Grain  size  refers  to  the  relative 
amount  of  work  completed  in  parallel .  Pipelined  vector 
operations  is  an  example  of  small  grain  parallel  computing  and 
distributing  computations  may  be  considered  as  large  grain 
parallel  computing. 

b .  Communication   to   computation   ratio 
Communications    to   computation   ratio   is  the  ratio 

between  the  time  spent  communicating  and  the  time  spent 
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computing.  Except  for  perfectly  parallel  problems,  time  lost 
for  communications  is  inherent  in  parallel  algorithms .  A 
large  communication  to  computation  ratio  constrains  a  parallel 
program's  performance.  The  objective  is  to  maximize  the  time 
a  node  spends  computing  and  to  minimize  the  time  spent 
communicating.  Reductions  in  the  communication  to  computation 
ratio  may  be  accomplished  by  increasing  the  grain  size, 
grouping  messages,  or  recalculating  values  instead  of 
receiving  the  value  from  another  node . 
c.   Sequential    Bottlenecks 

Sometimes  tasks  cannot  begin  until  completion  of 
a  previous  task,  limiting  number  of  tasks  that  can  be 
completed  in  parallel .  A  sequential  bottleneck  is  the 
circumstance  of  other  processors  waiting  for  another  processor 
to  complete  a  task  before  they  may  continue.  The  portion  of 
operations  that  are  not  completed  in  parallel  can 
substantially  restrict  speedup  as  can  be  seen  by  Amdahl's  Law 
(Equation  3.3) .  Inherent  in  sequential  bottlenecks  are  any 
requirements  of  the  nodes  to  synchronize.  The  only  method  to 
remove  sequential  bottlenecks  is  to  modify  or  reorder  the 
algorithm  in  order  to  overlap  sequential  code  with  other 
computations . 
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IV.  PARALLELIZATION  OF  PPT2 

The  purpose  of  this  research  was  to  determine  the 
potential  reduction  in  computation  time  for  the  NAVSPASUR 
satellite  motion  model  through  parallel  computing.  This 
potential  may  be  assessed  by  determining  the  relative  speedup 
and  efficiency  of  various  parallel  algorithms  employing  the 
methods  and  strategies  of  parallelization  discussed  in  Chapter 
III. 

As  stated  in  the  previous  chapter,  the  strategy  for 
developing  parallel  algorithms  depends  heavily  on  the 
architecture  and  topology  of  the  parallel  computer  used.  Due 
to  ease  of  access  and  familiarity  with  the  INTEL  iPSC/2 
hypercube,  the  parallel  computing  potential  "of  the  NAVSPASUR 
model  was  assessed  with  respect  to  implementing  the  model  on 
this  specific  multi-computer.  Although  performance  of  various 
algorithms  may  vary  somewhat  depending  on  the  specific 
parallel  computer,  it  was  hoped  that  some  generalizations  may 
be  made  from  the  application  of  the  NAVSPASUR  model  to  this 
hypercube . 

A.   VECTORIZATION 

The  first  method  of  parallelization  considered  for  the 
NAVSPASUR  model  was  vectorization .  Vectorization  is  usually 
simpler  than  the  other  methods  of  parallelization  to  apply. 
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Additionally ,  if  vectorization  proved  to  be  beneficial,  it  may 
be  incorporated  with  the  other  parallel  computing  methods  in 
order  to  realize  even  greater  speedup  and  efficiency. 

The  realized  speedup  due  to  vectorization  is  a  function  of 
the  number  of  vector  operations  within  a  specific  algorithm. 
Vector  operations  in  Fortran  are  usually  characterized  by  do 
loops  containing  scalar  operations  performed  on  each  element 
of  an  array.  With  each  node  possessing  its  own  vector  co- 
processor (VX  module) ,  these  do  loops  may  be  replaced  by 
single  calls  to  canned  subroutines.  These  subroutines  utilize 
a  vector  co-processor  to  perform  pipelined  vector  operations, 
significantly  reducing  computation  time.  In  addition  to  the 
explicit  vector  operations  within  an  algorithm,  sometimes 
there  exist  blocks  of  scalar  operations  that  may  easily  be 
transformed  into  vector  operations.  Scalar  operations 
contained  within  Fortran  do  loops  and  logical  if  statements 
are  usually  good  candidates . 

Analysis  of  the  Fortran  subroutine  PPT2  shows  that  the 
current  subroutine  contains  very  few  explicit  or  implicit 
vector  operations .  The  only  apparent  vector  operation  in  the 
satellite  state  vector  prediction  portion  of  PPT2  is  the 
computation  of  the  velocity  vector  at  the  very  end  of  the 
algorithm.  The  propagation  of  the  orbital  element  set 
comprises  the  majority  of  the  computations.  The  formulas  used 
to  propagate  the  orbital  elements,  presented  in  Chapter  II, 
may  be  characterized  as  lengthy,  algebraically-complex,  non- 
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linear  scalar  functions  of  the  mean  orbital  elements . 
Attempts  to  transform  these  formulas  into  a  set  of  vector 
operations  quickly  become  algebraically  overwhelming  and  a 
successful  transformation  is  highly  improbable. 

Likewise,  with  the  exception  of  the  computation  of  the 
variable  T2,  the  scalar  operations  contained  in  the  do  loops 
and  if  statements  of  PPT2  demonstrate  limited  vectorizing 
potential  due  to  data  dependency  within  the  loops  and 
statements.  Therefore,  based  on  this  initial  assessment  of 
limited  vectorizing  potential,  vectorization  was  not 
considered  as  a  viable  method  to  reduce  computation  time  and 
efforts  to  vectorize  PPT2  were  pursued  no  further. 

B.   DISTRIBUTION  OF  COMPUTATIONS 

With  vectorization  deemed  as  not  a  viable  method  of 
parallel  computing  for  the  NAVSPASUR  model,  any  reduction  in 
computation  time  needed  to  be  achieved  through  the  method  of 
distributing  computations .  Both  strategies  of  control 
decomposition   and  domain   decomposition   were  considered. 

In  order  to  better  appraise  the  potential  reduction  of 
computation  time  by  implementing  each  strategy,  separate 
parallel  algorithms  utilizing  the  different  strategies  were 
developed  and  evaluated  with  respect  to  the  measures  of 
speedup  and  efficiency.  Although  a  combination  of  both 
strategies  may  possibly  provide  the  greatest  speedup  and 
efficiency,  the  evaluation  of  separate  algorithms  implementing 
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the  respective  strategies  exclusively  provided  a  better  means 
to  determine  how  the  majority  of  the  reduction  in  computation 
time  was  achieved.  Additionally,  once  the  relative  benefit  of 
each  strategy  is  determined,  it  would  not  be  difficult  to 
incorporate  both  algorithms  together  on  the  hypercube . 

Using  the  two  distinct  strategies,  two  separate  sets  of 
programs  were  developed  and  evaluated.  Each  program  set 
consists  of  two  Fortran  programs  to  be  executed  on  the  INTEL 
iPSC/2 .  A  host  (system  resource  manager)  program  acquires  a 
specified  size  cube;  loads  the  node  program  on  the  processors 
of  the  attached  cube;  and,  upon  completion  of  the  algorithm, 
releases  the  cube  for  another  application.  The  node  program 
implements  the  parallel  algorithm.  Although  the  host  may  also 
serve  as  an  additional  processor,  the  parallel  algorithms 
utilized  only  the  nodes  of  the  iPSC/2.5 

Program  set  named  P3T-4  implements  an  algorithm  using  the 
control  decomposition  strategy  and  the  program  set  named  P3T 
implements  an  algorithm  using  the  domain  decomposition 
strategy.  Descriptions  of  the  algorithms  and  an  assessment  of 
their  respective  results  are  contained  in  the  subsections 
below. 


5The  routine  used  to  determined  run  times  for  the  various 
programs  is  measured  differently  for  the  host  and  the  nodes . 
In  order  to  obtain  comparable  times  to  compute  speedup  and 
efficiency,  the  actual  parallel  algorithms  utilize  only  the 
node  processors.  (iPSC/2  Programmer's  Reference  Manual,  1990, 
p.  3-174.) 
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1 .   Control  Decomposition  —  P3T-4 

The  strategy  of  control  decomposition  is  to  reduce  the 
NAVSPASUR  model's  computation  time  by  the  concurrent 
completion  of  separate  tasks  (processes)  by  the  individual 
nodes  of  the  hypercube .  By  reducing  the  computation  time 
necessary  to  predict  each  individual  satellite's  state  vector, 
an  overall  reduction  in  computation  time  to  predict  state 
vectors  for  all  tracked  objects  can  be  achieved.  Hence,  the 
ultimate  objective  of  the  program  set,  P3T-4,  was  to  reduce 
the  computation  time  for  a  single  object  in  orbit. 

a.  Algorithm 

In  order  to  predict  a  satellite's  state  vector 
considering  the  secular  and  periodic  correction  terms  due  to 
the  zonal  harmonics  and  a  correction  term  for  each  element  due 
to  the  sectoral  harmonics,  the  NAVSPASUR  model  requires  the 
completion  of  55  major  tasks.  The  majority  of  tasks  are 
evaluation  of  the  formulas  outlined  in  Chapter  II,  some  tasks 
are  a  group  of  computations  such  as  the  group  of  computations 
necessary  to  compute  the  variable  T2  or  the  group  of 
computations  to  solve  Kepler's  Equation  by  Stef f ensen' s 
Method. 

The  first  step  in  partitioning  these  tasks  among 
the  nodes  was  to  determine  which  tasks  could  be  completed 
concurrently.  Concurrency  was  determined  by  the  development 
of  a  hierarchy  of  the  formulas  used  by  the  NAVSPASUR  model. 
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Each  of  the  individual  tasks  were  listed  with  its  respective 
required  input.  Tasks  which  required  output  from  the 
completion  of  other  tasks  were  listed  below  those  tasks . 
Tasks  which  could  be  executed  concurrently  were  listed  on  the 
same  level  of  the  hierarchy.  Figures  4.1  and  4.2  contain  an 
extract  of  this  hierarchy. 

From  this  hierarchy  of  formulas,  the  number  of 
tasks  that  could  be  completed  concurrently  at  each  level  of 
the  hierarchy  ranges  from  2  to  14.  The  levels  where  only  a 
few  (less  than  four)  represent  potential  sequential 
bottlenecks .  These  sequential  bottlenecks  needed  to  be 
overcome  for  a  high  level  of  efficiency  to  be  achieved. 

Additionally,  the  number  of  FLOPS  required  varied 
considerably  among  the  tasks.  Some  tasks  required  as  few  as 
2  FLOPS,  while  other  tasks  required  over  200  FLOPS.  For 
example,  solving  Kepler's  Equation  by  Stef fensen' s  Method 
could  require  as  few  as  3  FLOPS  or  as  many  as  650  FLOPS  based 
on  the  speed  of  convergence.6  This  variance  in  the  number  of 
FLOPS  required  by  the  various  tasks  presented  a  potential 
problem  in  load  balancing. 

The  second  step  in  applying  this  strategy  was  to 
determine  the  method  of  scheduling  the  tasks  to  be 
accomplished  among  the  nodes .  A  manager-worker  algorithm  as 
described  in  Chapter  III  provides  an  easy  method  to  achieve 


6If  the  error  tolerance  of  less  than  10  8  is  not  met,  the 
NAVSPASUR  model  halts  Stef fensen' s  Method  after  20  iterations. 
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10 

t 
(2.37) 

(2.37) 

11 

f 
(2.37) 

(2.37) 

Figure  4 . 1  Hierarchy  of  NAVSPASUR  Formulas 


load  balancing  among  the  nodes .  However,  for  a  large  number 
of  small  tasks  as  is  the  case  with  the  NAVSPASUR  model,  this 
type  of  algorithm  can  become  communication  intensive.  A  large 
communication-to-computation  ratio  can  severely  limit  speedup 
and  could  possibly  cause  a  parallel  algorithm  run  longer  than 
the  original  serial  algorithm.   In  an  effort  to  reduce  the 
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11 

Figure  4.2  Hierarchy  of  NAVSPASUR  Formulas  (continued) 


amount  of  communication  in  the  algorithm,  an  algorithm  that 
pre-scheduled  tasks  was  chosen. 

With  pre-scheduled  algorithms,  each  node  knows  its 
own  tasks  to  accomplish  without  communicating  with  a  "manager" 
node.  Additionally,  the  absence  of  a  "manager"  node  frees  one 
more  node  to  assist  in  completing  the  required  tasks .   Despite 
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these  positive  points,  pre-scheduled  algorithms  are  not 
without  their  own  drawbacks.  Pre-scheduling  algorithms  do  not 
provide  automatic  load  balancing  as  is  the  case  with  self- 
scheduling  algorithms .  Care  must  be  taken  to  ensure  each  node 
does  essentially  the  same  amount  of  work.  Also,  pre- 
scheduling  algorithms  require  a  fixed  number  of  nodes . 
Requiring  a  fixed  number  of  nodes  restricts  the  flexibility  of 
algorithm  and  its  potential  speedup. 

The  third  step  in  applying  this  strategy  was  to 
determine  the  number  of  nodes  for  pre-scheduling.  Factors  in 
determining  the  number  of  nodes  or  cube  size  include  the 
potential  requisite  speedup,  the  amount  of  computation 
completed  between  communication  messages,  potential  sequential 
bottlenecks,  the  number  of  messages  required,  and  efficient 
use  of  all  of  the  nodes .  In  an  attempt  to  achieve  an 
appreciable  speedup,  the  algorithm  was  developed  to  use  a 
minimum  of  four  nodes . 

The  final  step  in  applying  this  strategy  was 
assigning  specific  tasks  to  each  node.  Load  balancing  and 
potential  synchronization  problems  were  considered  in  the 
assignment  of  the  tasks  to  the  respective  nodes.  Often, 
communication  distances  are  also  considered  in  assigning  tasks 
to  the  nodes;  however,  with  such  a  small  number  of  nodes  the 
communication  distances  were  negligible.  The  diagram  in 
Figure  4.3  depicts  how  the  tasks  were  distributed  among  the 
four  nodes.   Some  of  the  smaller  initial  tasks  were  duplicated 
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by  several  nodes  in  order  to  limit  the  amount  of  communication 
and  eliminate  potential   synchronization  problems   at   the 
beginning  of  the  algorithm.   A  complete  listing  of  the  source 
code  for  program  set,  P3T-4,  is  contained  in  Appendix  C. 
b.      Assessment 

To  establish  a  baseline  to  compare  the  performance 
of  the  parallel  program  sets  with  the  serial  subroutine  PPT2, 
execution  times  for  PPT2  to  predict  the  position  of  a  single 
satellite  and  a  set  of  satellites,  ranging  from  12  to  20736, 
were  measured.  The  measured  times  were  the  elapsed  time  for 
the  execution  of  PPT2  in  milliseconds  on  a  single  node  using 
the  node's  internal  clock.  Using  ten  different  sets  of 
satellite  data,  the  mean  execution  time  for  propagating  a 
single  satellite  was  11.2  milliseconds.  The  mean  execution 
time  for  PPT2  to  propagate  12  to  20736  satellites  is  depicted 
in  Figure  4.4.  These  mean  execution  times  were  used  to 
compute  the  speedup  and  efficiency  of  both  parallel  program 
sets . 

(1)  Results .  Program  set  P3T-4  was  executed  with 
the  same  sample  satellite  sets  as  were  used  with  PPT2 .  The 
graph  in  Figure  4.5  shows  a  comparison  of  the  mean  executions 
of  P3T-4  and  PPT2  for  a  various  number  of  satellites.  As  one 
can  see,  P3T— 4  was  nearly  two  times  slower  than  the  original 
serial  subroutine . 
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Node  0                 Node  1                   Node  2                Node  3 

Recover  a" 

Compute  secular 
corrections  -  I,  a,  and 

e 

Compute  long  period 
correction  —  1 

Compute  short  period 
corrections  -  1  and  a 

Solve  Kepler's 
Equation 

Collect  all  terms 

Compute  state  vector 

Compute  secular 
correction  —  g 

Compute  long  period 
corrections  —  e  and  I 

Compute  short  period 
corrections  -  e  and  [ 

Compute  T2 

Compute  long  period 
correction  —  z 

Solve  Kepler's 
Equation 

Compute  short  period 
corrections  —  z 

Compute  secular 
correction  -  h 

Compute  sectoral 
terms 

Compute  long  period 
correction  —  h 

Compute  short  period 
correction  —  h 

Figure  4 . 3  P3T-4  Algorithm 


For  a  single  satellite,  the  mean  execution  time 
for  P3T-4  was  23.3  milliseconds  as  compared  to  only  11.2 
milliseconds  for  PPT2 .  A  closer  look  at  where  the  time  is 
spent  reveals  the  shortcomings  of  this  parallel  algorithm. 
Table  4 . 1  shows  a  comparison  of  mean  execution  times  for  the 
one  node  executing  PPT2  and  the  four  nodes  executing  P3T-4 . 
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Figure  4.4  PPT2  Execution  Times 


The  times  expressed  in  the  table  are  the  mean  for  the  ten 
sample  satellites.  The  communication  time,  tB,  includes  the 
time  spent  sending  and  receiving  messages,  plus  the  time  spent 
waiting  for  messages  to  arrive.  The  computing  time,  ta,  is 
the  time  each  node  spent  completing  its  respective  tasks. 
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Figure  4.5  P  T-4  Execution  Times 

As  seen  by  the  times  listed  in  Table  4.1,  two 
problems  with  the  algorithm  become  evident.  First, 
communication  time  outweighs  the  actual  computation  time  for 
each  node.  The  causes  of  the  long  communication  time  are 
number  of  messages  required  by  this  specific  partition  of 
tasks  and  synchronization  problem  of  nodes  waiting  to  receive 
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Table  4.1  P3T-4  Execution  Time  Breakdown 


Algorithm 

tc 

(milliseconds) 

tm 
(milliseconds) 

t 
(milliseconds) 

PPT2 

(one  node) 

11.2 

NA 

11.2 

P3T-4 

node  0 

4.3 

19.0 

23.3 

node  1 

2.2 

15.9 

18.1 

node  2 

2.7 

14.7 

17.4 

node  3 

5.8 

15.7 

21.5 

computed  values  from  other  nodes.  Second  and  most 
importantly,  although  the  actual  computation  time  was  reduced, 
the  total  execution  time  of  the  parallel  algorithm  is  longer 
than  the  serial  algorithm  implemented  by  PPT2 . 

(2)  Improvements .  The  major  source  of  the  problem 
is  the  communication  to  computation  ratio.  This  parallel 
algorithm  using  four  nodes  requires  23  messages  among  the 
nodes.  The  NAVSPASUR  model  is  not  computationally  intensive 
enough  to  offset  this  amount  of  communication.  To  improve 
performance  this  ratio  must  be  reduced. 

One  method  to  reduce  the  communication  to 
computation  ratio  is  to  reduce  the  amount  of  communication. 
One  way  to  reduce  the  number  of  communications  is  to 
restructure  the  partitions.  However,  other  partitions  using 
four  nodes  were  analyzed,  yet  none  could  significantly  reduce 
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the  number  of  messages.  One  alternative  to  possibly  achieve 
any  speedup  was  to  partition  the  computations  among  fewer 
nodes.  The  diagram  in  Figure  4.6  depicts  the  distribution  of 
tasks  for  a  two  node  algorithm  P3T-2 .  Although  the  algorithm 
displayed  potential  in  reducing  the  total  execution  time  to 
less  than  PPT2,  speedup  would  be  further  bounded  by  two.  A 
speedup  of  two  would  not  outweigh  the  costs  in  procuring  a 
parallel  computer  merely  for  satellite  propagation. 

A  second  alternative  for  reducing  the 
communication  to  computation  ratio  is  to  somehow  increase  the 
amount  of  computation  between  messages .  The  amount  of 
computation  could  be  increased  by  computing  the  intermediate 
values  for  n  satellites  in  an  array  and  sending  the  array  in 
one  message.  The  communication  would  remain  essentially 
constant  and  the  computation  between  messages  would  increase 
by  a  factor  of  n.  An  estimate  of  this  improvement  may  be  made 
for  the  mean  times  in  Table  4.1  using  speedup  and  efficiency. 
From  Chapter  III,  efficiency  is  expressed  as  the  following: 

E=_£p=_tu>  (4.i) 

p    pt(p) 

where 

t(p)=tc(p) +tm(p)  (4.2) 

If  En  is  the  efficiency  of  the  P3T-4  algorithm  computing  n 
satellites'  values  between  messages,  then 
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NodeO 

Nodel 

Recover  a" 

Compute  T2 

Compute  secular 
collections  —  1,  a,  and 
e 

Compute  secular 
corrections  —  g  and  h 

Solve  Kepler's 
Equation 

Compute  sectoral 

terms 

Compute  long  period 
corrections  —  1,  e,  and 
I 

Compute  long  period 
corrections  —  z  and  h 

Compute  short  period 
corrections  —  1,  e,  a, 
and  I 

Compute  short  period 
corrections  —  z  and  h 

- 

Solve  Kepler's 
Equation 

Collect  all  terms 

Compute  state  vector 

Figure    4 .  6   P3T-2   Algorithm 


E  _-. 


nt(l) 


(4.3) 


■    p(nt.(p)*t.(p>) 

Solving  Equation  4.1  for  t(l)  and  substituting  into  Equation 
4 . 3  yields 
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..*<t,(p)+t.(p)), 

n      (nte(p)+t.(p)) 


Replacing   E   by 


I- _L<±> (4.5) 

P(tc(P)+tm(p)) 


and  simplifying  yields 

t(l) 


£  =• 


tm(P)  (4.6) 

p(te(p)+_=Jp_) 

Take  the  limit  as  n   goes  to  infinity  and  the  upper  bound  for 
En  is 

lint  E =  t(1)  (4.7) 

"~     "    Ptc{p) 

Setting  p  equal  to  four  and  using  values  from  Figure  4.1,  En 
is  bounded  by  .48.   This  implies  the  maximum  speedup  of  the 
modified  algorithm  would  be  bounded  by  1.92.    Again,  the 
benefit  of  using  this  strategy  is  quite  limited. 
2 .   Domain  Decomposition  —  P3T 

The  strategy  of  domain  decomposition  is  to  reduce  the 
NAVSPASUR  model's  computation  time  by  the  concurrent 
computation  of  several  satellites'  state  vectors.  Each  node 
of  the  hypercube  would  complete  identical  tasks  on  different 
satellite  data  sets,  simultaneously.  Hence,  the  ultimate 
objective  of  program  set  P3T  was  to  reduce  the  overall 
computation  time  for  several  objects  in  orbit. 
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a.   Algorithm 

Unlike  the  application  of  the  control 
decomposition  strategy,  the  application  of  the  domain 
decomposition  strategy  to  the  NAVSPASUR  model  was  seemingly 
less  arduous.  First,  because  each  node  propagates  satellite 
data  sets  independent  of  the  other  nodes,  there  exists  no 
requirement  for  communication  or  synchronization  among  the 
nodes.  This  lack  of  communication  simplifies  the  load 
balancing  and  sequential  bottleneck  problems  present  in  the 
P3T-4  parallel  algorithm. 

Second,  because  each  node  may  perform  the 
satellite  state  vector  prediction  tasks  serially,  the  existing 
subroutine  PPT2  may  be  used  with  only  minor  modifications . 
Developing  a  parallel  algorithm  for  predicting  an  individual 
satellite' s  state  vector  was  a  major  task  for  the  control 
decomposition  strategy.  Additionally,  by  using  the  existing 
PPT2  code,  the  other  tasks  completed  by  PPT2  may  be  requested 
by  the  user  using  the  same  control  variables  as  used  by  the 
original  PPT2  subroutine.  The  P3T-4  program  set  was 
restricted  to  only  predicting  a  satellite's  state  vector. 

Finally,  by  using  the  serial  subroutine  PPT2,  this 
strategy  may  be  reduced  to  only  developing  an  algorithm  to 
distribute  the  data  in  a  timely  manner.  Maximum  efficiency 
will  be  achieved  if  the  nodes  do  not  have  to  wait  for 
satellite  data  to  propagate. 
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Intuitively,  this  strategy  seems  perfectly 
parallelizable .  Although  the  various  tasks  performed  by  PPT2 
require  different  computation  times,  the  total  execution  time 
for  each  node  will  be  essentially  the  same  if  it  is  assumed 
that  the  various  tasks  are  randomly  distributed  throughout  the 
input  data  sets .  The  concern  for  this  algorithm  was  the 
potential  sequential  bottlenecks  at  input/output  portions  of 
the  program  set.  Reading  and  writing  to  external  files  can  be 
very  time  consuming.  In  addition  to  the  actual  time  spent 
reading/writing  to  an  external  file,  a  certain  amount  of  time 
is  spent  to  access  the  file.  In  order  to  minimize  this  time, 
the  number  of  calls  to  read/write  to  a  file  should  be 
minimized. 

With  the  specific  iPSC/2  hypercube  available, 
input/output  is  completed  sequentially.  Each  node  must 
compete  with  the  other  nodes  to  read  and  write  to  external 
files.  To  minimize  time  lost  to  accessing  the  file  cataloging 
the  set  of  satellites,  a  node  was  devoted  to  both  the 
reading/distributing  of  input  satellite  data  and  to  the 
collecting/writing  of  the  results.  The  idea  of  using  a  single 
node  to  read  the  data  and  a  single  node  to  subsequent  write 
the  output  is  simple  to  implement  and  proved  to  be  fastest 
method  to  overcome  the  bottlenecks  with  the  input/output.  The 
remaining  nodes  of  the  hypercube  implement  the  NAVSPASUR  model 
using  a  slightly  modified  PPT2 .  The  diagram  in  Figure  4.5 
depicts  how  the  satellite  data  is  distributed.   The  cost  of 
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using  this  simple  algorithm  to  distribute  and  collect  the  data 
is  the  loss  of  two  nodes.  The  only  restriction  on  the  size  of 
the  hypercube  required  by  P3T  is  that  the  attached  cube  must 
contain  at  least  four  nodes  to  achieve  any  speedup.  A 
complete  listing  of  the  source  code  for  program  set,  P3T,  is 
contained  in  Appendix  D. 

h .      Assessment 

(1)  Results .  The  graph  in  Figure  4.8  depicts  the 
mean  execution  time  for  P3T  versus  the  number  of  satellites 
propagated  using  hypercubes  of  four  and  eight  nodes 
respectively.  P3T  was  successful  in  reducing  the  overall 
execution  time  to  propagate  several  satellites.  Table  4.2 
shows  the  speedup  and  efficiency  of  P3T  for  a  various  number 
of  satellites.  As  seen  in  Table  4.2,  the  speedup  achieved 
using  all  eight  nodes  of  the  hypercube  was  approximately  three 
times  larger  than  the  speedup  achieved  using  four  nodes .  With 
this  parallel  algorithm  using  six  "working"  nodes  for  an  eight 
processor  hypercube  and  only  two  "working"  nodes  for  a  four 
processor  hypercube,  an  increase  in  speedup  by  approximately 
a  factor  of  three  was  expected.  More  notable  was  the  increase 
in  efficiency  using  eight  versus  four  nodes .  The  efficiency 
increased  from  .45  to  .67.  This  increase  in  efficiency 
indicates  that  P3T  applied  to  a  hypercube  of  greater  dimension 
could  yield  even  greater  speedup  and  efficiency. 
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Figure  4 . 7  P3T  Algorithm 


Table  4.2  also  indicates  that  P3T  performance 
increased  somewhat  with  an  increase  in  the  total  number  of 
satellites  propagated.  Because  with  this  parallel  algorithm 
the  computation  to  communication  ratio  does  not  vary  with  the 
number  of  satellites,  this  small  increase  in  performance  must 
be  primarily  due  to  the  diminishing  impact  of  the  algorithm' s 
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Figure  4 . 8  P  T  Execution  Times 

overhead  on  total  execution  time.  This  overhead  includes  one 
additional  message  containing  the  total  number  of  satellites 
to  propagate  from  the  distributing  node  to  the  other  nodes; 
some  small  computations  by  working  nodes  to  determine  number 
of  data  sets  to  receive;  and  a  halting  message  sent  by  the 
collecting  node  to  the  host  once  all  of  the  nodes  are 
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Table  4.2  P3T  Performance  (satellite  position  prediction) 


p3T 

Number  of  Satellites 

Sp 

EP 

8  nodes 

20736 

5.31 

.66 

1728 

5.37 

.67 

144 

5.24 

.66 

12 

4.13 

.52 

4  nodes 

20736 

1.78 

.45 

1728 

1.80 

.45 

144 

1.79 

.45 

12 

1.66 

.33 

finished.  Because  these  additional  messages  and  computations 
are  only  completed  once  in  the  program,  the  time  cost 
associated  with  this  overhead  becomes  negligible  as  the  number 
of  satellites  propagated  is  increased.  The  speedup  and 
efficiency  remained  fairly  constant  for  greater  than  144 
satellites . 

To  estimate  the  impact  of  increasing  the  amount  of 
computation  on  P3T' s  speedup  and  efficiency,  the  execution 
time  to  predict  a  satellite's  position  and  compute  the  partial 
derivatives  of  the  orbital  elements  was  also  measured  for  PPT2 
and  P3T.  These  results  are  summarized  in  Table  4.3.  Both 
speedup  and  efficiency  improved  with  this  increase  in 
computation . 
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Table  4.3  P3T  Performance  (satellite  position  prediction  plus 
computation  of  partial  derivatives) 


P3T 

Number  of  Satellites 

Sp 

ep 

8  nodes 

20736 

5.48 

.69 

1728 

5.53 

.69 

144 

5.45 

.68 

12 

4.82 

.60 

4  nodes 

20736 

1.84 

.46 

1728 

1.86 

.47 

144 

1.86 

.47 

12 

1.82 

.46 

(2)  Improvements .  The  performance  results  of  this 
algorithm  using  only  four  and  eight  nodes  indicated  a 
potential  increase  in  both  speedup  and  efficiency  if  this 
algorithm  could  be  applied  to  a  hypercube  of  greater 
dimension.  Because  the  number  of  working  nodes  is  not  fixed 
for  this  algorithm,  P3T  could  be  applied  easily  to  any  size 
hypercube  with  no  modifications.  The  efficiency  of  the 
algorithm  should  increase  with  the  cube  dimension  until  the 
time  to  distribute  a  separate  satellite  data  to  each  working 
node  exceeds  the  time  required  by  node  to  propagate  a  single 
satellite.  Therefore,  a  possible  improvement  in  the 
algorithm' s  performance  can  be  achieved  by  applying  the 
algorithm  to  an  optimal  dimension  hypercube. 
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Because  the  hypercube  at  the  Naval  Postgraduate 
School  is  limited  to  eight  nodes,  a  model  was  used  to  estimate 
the  optimal  hypercube  dimension.  The  total  execution  time  for 
P3T  to  propagate  n  satellites  with  p  processors,  t (p) ,  can  be 
modeled  by  the  following  expression: 

t  <p)  =tWI  (p)  +tw2  (p)  +t.  (p)  (4.8) 

where  twl  (p)  is  the  time  the  last  node  must  wait  to  receive  its 
first  satellite  data  set,  tw2  (p)  is  the  total  time  the  last 
node  must  wait  to  receive  all  of  its  subsequent  satellite  data 
sets,  and  tc(p)  is  the  time  for  each  node  to  propagate  each 
share  of  the  n  satellites . 

As  described  in  previous  chapter,  the  iPSC/2  uses 
a  Direct-Connect  Module  (DCM) .  This  module  provides  an 
essentially  constant  startup  time  for  a  message  to  be  passed 
between  two  nodes  regardless  of  the  length  of  the  message 
path.  Hence,  the  time  to  send  a  message  between  two  nodes  is 
a  function  of  only  the  size  (number  of  bytes)  of  the  message. 
Because  all  messages  between  the  distributing  node  and  working 
nodes  are  of  a  constant  size  (674  bytes) ,  the  time  of  a  single 
message  between  the  distributing  node  and  each  working  node  is 
essentially  constant.  For  this  algorithm,  there  are  p-2 
working  nodes.  Denoting  the  time  to  send  a  single  message 
between  the  distributing  node  and  a  working  node  as  tm(l),  the 
twl  (p)  may  be  modeled  by  the  following: 
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tHl{p)  =[  {p-2)  -1]  tjl)  =  (p-3)  tm{l)  (4.9) 

In  order  to  determine  tm(l),  several  experiments  were  run 
using  the  specific  iPSC/2  located  at  the  Naval  Postgraduate 
School.  The  mean  value  of  tra(l)  was  approximately  .693 
milliseconds . 

A  working  node's  total  wait  time  for  subsequent 
satellite  data  sets,  tw2  (p)  r  is  a  function  of  the  elapsed  time 
for  the  working  node  to  propagate  a  single  satellite,  the 
elapsed  time  for  the  distributing  node  to  send  a  subsequent 
satellite  data  set  to  the  working  node,  and  the  number  of 
satellites  the  working  node  must  propagate.  Because  the 
distributing  node  distributes  the  data  while  the  working  nodes 
are  computing,  the  wait  time  is  zero  if  the  subsequent 
satellite  data  arrives  before  the  working  node  is  ready  to 
receive.7  If  the  subsequent  satellite  data  arrives  after  the 
node  is  finished  with  the  previous  satellite,  the  wait  time  is 
the  difference  between  the  computing  time  for  the  previous 
satellite  and  the  elapsed  time  for  the  distributing  node  to 
send  the  node  another  satellite  data  set.  Because  the 
distributing  node  must  send  a  data  set  to  each  of  the  other 
working  nodes  prior  to  sending  a  subsequent  data  set  to  the 
last  node,  the  elapsed  time  for  the  distributing  node  to  send 
another  data  set  may  be  also  modeled  by  tMl  (p)  in  Equation  4.9. 


7If  a  node  is  not  ready  to  receive  a  message  from  another 
node,  the  message  is  stored  in  a  local  buffer.  The  time  to 
read  from  this  buffer  is  negligible. 
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The  total  wait  time  is  then  the  wait  time  for  each  subsequent 
satellite  data  set  multiplied  by  the  number  of  data  sets 
received  by  each  working  node.  Hence,  by  letting  tl  represent 
the  time  to  propagate  a  single  satellite,  tw2  (p)  raay  be  modeled 
by  the  expression: 


tw2  (P)  =  < 


0  if  twAp)<tl 

wi^'         (4.10) 

(— ^-1)  (twl(p)-tl)     if  twl(p)>tl 
p-2 


Assuming  the  time  for  one  node  to  propagate  n 
satellites,  t(l),  is 

t(l)=n(tl) 

The  total  computation  time  for  each  working  node,  tc  (p) ,  may 
be  approximated  by  the  continuous  function: 

tc(P)  =  n(to)  <411> 

p-2 

Substituting  Equation  4.8  into  Equations  3.2  and 
3.3,  the  speedup  and  efficiency  using  a  total  of  p  processors 
may  be  modeled  by  the  following  expressions: 

s  =  t(l)=       n(tl) 


P     t(p)      tul{p)+tw2{p)+tc(p) 


wl   *X"/    "w2 

(4.12) 


_Sp_  n(tl) 


Ep~  P     P[twl(p)+tw2(p)+tc(p) 


Setting  tl  equal  to  11.2  milliseconds  and  tm(p)  equal  to  .693 
milliseconds,  t (p) ,  Sp,  and  Ep  were  computed  using  Equations 
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4.9  -  4.12.  Figures  4.9  and  4.10  depict  the  estimates  of 
t  (p)  ,  Sp,  and  Ep  for  propagating  1728  satellites  using  4  to 
1024  processors  (a  cube  dimension  of  2  to  10)  .  Using  the 
above  model,  P3T  is  capable  of  achieving  a  maximum  speedup 
over  16  with  a  corresponding  efficiency  of  approximately  90 
percent  for  a  hypercube  of  dimension  5  (25  nodes) .  A 
hypercube  of  dimension  4  achieves  a  speedup  of  nearly  14  and 
an  efficiency  of  almost  90  percent.  Although  these  graphs  are 
only  estimates  of  the  actual  values  of  speedup  and  efficiency, 
they  correspond  closely  to  the  actual  timed  results  for  four 
and  eight  node  size  hypercubes  and  provide  a  good  indication 
of  the  parallel  computing  potential  of  this  algorithm  for 
higher  dimension  hypercubes . 

Another  possible  improvement  to  this  domain 
decomposition  algorithm  is  to  eliminate  the  need  for  the 
distributing  and  collecting  nodes.  Although  the  iPSC/2 
located  at  the  Department  of  Mathematics,  Naval  Postgraduate 
School  is  not  capable  of  concurrent  input  and  output, 
concurrent  file  systems  are  available.  Separate  I/O  nodes 
allow  the  computing  nodes  of  a  hypercube  to  concurrently  read 
and  write  to  external  files .  A  concurrent  file  system  would 
eliminate  the  need  of  the  distributing  and  collecting  nodes . 
Additionally,  the  INTEL  Concurrent  File  System  (CFS)  allows 
for  a  common  file  pointer  to  be  maintained  among  the  computing 
nodes,  minimizing  overhead  in  algorithm.  Depending  on  the 
efficiency  of  the  I/O  nodes,  a  further  increase  in  the  overall 
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Figure  4 . 9   Estimated  Execution   Time   of  P3T   for  Various 
Hypercube  Sizes 


algorithm  could  be  expected.  For  further  information  on  the 
INTEL  Concurrent  File  System  see  (iPSC/2  User's  Guide,  1990, 
7-1  -  7-18) . 
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N-1728.  TM(1)-693.  T1-11.2 


Figure  4.10   Estimated  Speedup   and  Efficiency  of  P3T   for 
Various  Hypercube  Sizes 
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V.   CONCLUSIONS  AND  SUGGESTIONS  FOR  FURTHER  RESEARCH 

The  ultimate  objective  of  this  thesis  is  determine  the 
parallel  computing  potential  of  the  NAVSPASUR  satellite  motion 
model.  From  the  results  given  in  Chapter  IV,  vectorization 
and  control  decomposition  strategies  proved  not  beneficial  in 
significantly  reducing  the  computation  time  of  the  NAVSPASUR 
model .  Very  few  apparent  vector  operations  exist  in  the 
model,  and  any  attempt  to  transform  the  formulas  into  vector 
operations  became  algebraically  overwhelming.  Although  the 
analytic  formulas  of  the  NAVSPASUR  model  are  quite  lengthy  and 
complex,  they  proved  to  be  not  computationally  intensive 
enough  to  warrant  decomposition  of  the  algorithm  by  tasks. 

On  the  other  hand,  the  domain  decomposition  strategy 
showed  promise  if  the  satellites  are  propagated  in  a  batch 
mode.  The  P3T  algorithm  was  simple  to  apply.  The  algorithm 
provided  the  flexibility  to  vary  the  dimension  of  the 
hypercube  and  to  easily  modify  the  model  itself.  Although 
only  a  maximum  efficiency  of  . 67  was  achieved,  the  potential 
efficiency  was  artificially  bounded  by  the  number  of  nodes 
available  with  the  specific  hypercube  multi— computer  used. 
Having  a  maximum  of  only  eight  nodes  available,  the  efficiency 
of  P3T  is  bounded  above  by  .75.  Using  the  model  of  P3T' s  total 
execution  time  described  in  Chapter  IV,  it  was  shown  that  a 
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maximum  efficiency  of  just  under  90  percent  could  be  achieved 
with  a  hypercube  consisting  of  16  nodes.  The  corresponding 
speedup  factor  of  nearly  14  would  significantly  reduce  the 
time  to  predict  the  state  vectors  for  several  thousand 
satellites . 

The  success  of  P3T  manifests  several  possible  areas  of 
future  research.  One  area  would  be  to  apply  P3T  to  higher 
dimension  cubes  and  validate  the  estimates  of  speedup  and 
efficiency  using  more  than  eight  nodes.  Because  the  number  of 
working  nodes  is  not  fixed  for  this  algorithm,  P3T  could  be 
applied  easily  to  any  size  hypercube  with  no  modifications . 
Once  the  optimal  size  is  found,  one  could  attach  several  sub- 
cubes  of  the  optimal  dimension  and  determine  the  benefit  of 
propagating  several  smaller  catalogs  of  satellite  data  instead 
of  propagating  one  large  catalog. 

Another  possible  area  of  research  would  be  to  modify  the 
current  satellite  motion  model  to  increase  the  accuracy  of  its 
predictions.  The  results  in  the  previous  chapter  showed  an 
increase  in  performance  of  the  P3T  if  the  amount  of 
computation  was  increased.  Hence,  greater  accuracy  could  be 
achieved  in  far  less  time  using  the  P3T  algorithm  than  the 
time  using  the  original  serial  algorithm.  Additionally,  from 
these  results,  the  parallel  computing  potential  of  satellite 
motion  models  that  are  more  computationally  intensive  would  be 
greater.   For  example,  semi-analytic  models  which  combine  the 
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benefits  of  analytic  and  numerical  models  might  be  good 
candidates . 

Overall,  the  main  lesson  learned  from  this  thesis  is  that 
satellite  position  prediction  can  be  made  more  timely  through 
parallel  computing.  Although  the  best  method  of 
parallelization  might  vary  depending  on  the  specific  model 
used,  parallel  computing  is  a  viable  option  to  achieve  timely 
satellite  position  prediction  for  the  growing  number  of 
objects  in  orbit  around  the  Earth. 
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APPENDIX  A 
INPUT/OUTPUT  OF  PPT2 

This  appendix  contains  a  listing  of  the  primary  variables 
used  by  NAVSPASUR  subroutine,  PPT2 .   Tables  A.l  -  A. 4  are 
extracted  from  (Solomon,  1991,  pp.  12  -  14) .   Table  A. 5  was 
interpreted  from  the  NAVSPASUR  source  code . 
Table  A.l  PPT2  Calling  List 


Variable 

Definition 

Input 

Output 

IND 

control  variable 

I 

TM 

time 

I 

0 

KZ 

0  for  inertial  coordinates 

1  for  Earth-fixed 

I 
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ible  A.  2  PPT2  Input 

Control  Variables 

Variable 

Option 

KF(1) 

IND 

0 
0 
0 
0 

<0 
0 
1 
2 

Position,  no  partial  derivatives 
Position,  partial  derivatives 
Secular  recovery 
Epoch  update 

KZ 

0 

1 

Inertial  coordinates 
Earth-fixed  coordinates 

KF(3) 

0 

1 

>1 

Secular  and  drag  corrections  only 

Secular,  drag,  and  periodic  correctioj 

Secular,  drag,  periodic,  and 
sectoral/tesseral  corrections 

KF  (2)  ,  KF  (4)  - 
KF(10) 

Not  used  by  PPT2 
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Table  A. 3  PPT2  Common  Blocks 


Block 

Variable 

Description 

Input 

Output 

CONS 

A(64) 

constants  set  by- 
subroutine  CONS1 

I 

PPT 

F(25) 

stored  mean  elements 

I 

0 

OSC(IO) 

osculating  elements 

0 

KF(10) 

control  variables 

I 

0 

CF  ( 1 0 ) 

used  for  appearance 

I 

0 

- 

prediction 

BS  (3,4) 

observation  stations 
position  (R, £, 6, ft) 

I 

U(3) 

r 

0 

V(3) 

V 

0 

W(3) 

w 

0 

R 

r 

0 

VEL(3) 

r,    velocity- 

0 

DCSUB 

PE(6,  8) 

partial  derivatives 

0 

.  .  . 

others  not  used  in 
PPT2 

FOREO 

RHO(3) 

P 

0 

RHOS 

P2 

0 

HDR 

*'P 

0 

HDV 

±-  r 

0 

RDV 

P± 

0 

DEL 

Af 

0 

ITER 

iteration  counter 

0 
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Table  A. 4  Array  F 


Index 

Variable 

Definition 

Input 

Output 

1 

1" 

mean  mean  anomaly 

I 

0 

2 

m 

mean  motion 

I 

0 

3 

M2 

first  decay  term 

I 

0 

4 

M3 

second  decay  term 

I 

0 

5 

e" 

eccentricity 

I 

0 

6 

g" 

mean  argument  of 
perigee 

I 

0 

7 

h" 

mean  ascending  node 

I 

0 

8 

cos  I" 

cosine  inclination 

I 

0 

9 

t 

epoch 

I 

0 

10 

# 

revolution  number 

I 

0 

11 

dg'Vdl" 

0 

12 

dh'Vdl" 

0 

13 

a" 

mean  semi-major  axis 

0 

14 

a 

0 

15 

sin  I" 

sine  inclination 

0 

16 

e 

0 

17-25 

not  used  in  PPT2 
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Table  A. 5   Array    OSC 


Index 

Variable 

Definition 

1 

cos  f 

cosine  true  anomaly 

2 

sin  f 

sine  true  anomaly 

3 

1 

osculating  mean  anomaly 

4 

e 

osculating  eccentricity 

5 

g 

osculating  argument  of  perigee 

6 

h 

osculating  ascending  node 

7 

cos  I 

cosine  osculating  inclination 

8 

a 

osculating  semi— major  axis 

9 

Al 

secular  and  drag  correction  term  for  1 

10 

not  used  by  PPT2 
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APPENDIX  B 
INTEL  iPSC/2  SPECIFICATIONS 


This  appendix  contains  a  summary  of  the  iPSC/2  hypercube 
multi-computer  specifications  as  described  in  (iPSC/2  User's 
Guide,  1990,  pp.  1-1  -  1-11) .  The  exact  performance  values 
were  obtained  from  (Arshi,  1988,  pp.  17  -  22) . 

iPSC/2 

System  Resource  Manager  (Host) 

Central  Processing  Unit     INTEL  80386  (4  MIP) 

Numeric  Processing  Unit     INTEL  80387  (250  KFLOP  64-bit) 


Memory 


External  Communication 


Operating  System 


8  Mbyte 

Ethernet  TCP/IP  local  area 
network  port 

AT&T  UNIX,  Version  V,  Release 
3.0 


Nodes 


Node  Processor 


Numeric  Co-processor 


Operating  System 


INTEL  80386  (4  MIP) 


INTEL  80387  (250  KFLOP  64-bit) 


NX/ 2 
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Internal  Communication 

Direct-Connect™  Routing    Message  Latency  —  350  (isec 

Message  Bandwidth  —  2.8 
Mbytes/sec 
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APPENDIX  C 
P3T-4  SOURCE  CODE  LISTING 

This  appendix  contains  the  listing  of  host  and  node 
programs  comprising  P3T-4  program  set.  The  host  program  reads 
the  satellite  data,  loads  the  node  program,  and  writes  the 
results  to  an  external  output  file .  The  node  program  contains 
the  instructions  for  the  four  nodes  to  complete  their 
respective  portions  of  the  P3T-4  parallel  algorithm.  The  node 
program  is  limited  to  only  satellite  state  vector  prediction. 

Host  Program: 

program  P3T4h 

• 

*  This  host  program  reads  the  satellite  data  from  a 

*  external  file,  loads  the  program  P3T-4n  on  the  hypercube 

*  nodes,  and  writes  results  to  external  output  file. 

* 

implicit  real*8  (a-h,o-z) 

dimension  sat (49, 6000) , end (49, 1) 
integer  pid,msglen, eof 

data  pid/0/,msglen/392/ 
data  isat/1/ 

call  setpid(pid) 

*  Load  node  program  P3T4n  on  nodes 

print *,' host  loading  nodes' 
call  load('p3t4n' ,-l,pid) 

*  Read  satellite  data  and  send  data  to  nodes  until  reach 

*  end-of-file 

open (unit=10, f ile=' /usr/phipps/in4' , form=' unformatted' ) 
read  (unit=10, iostat=eof )   (sat ( j, 1) , j=l, 49) 

20     if  (eof.ge.O)  then 

call  csend  (5,  sat  (1,  isat )  ,  msglen,  -1,  pid) 

isat=isat+l 

read  (unit=10, iostat=eof )  (sat  (  j, isat ) , j=l, 49) 

*  Receive  results  from  nodes 
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call  crecv (99, sat (1, isat-1) ,msglen) 
go  to  20 
endif 

Send  message  for  nodes  to  halt  and  clear  cube  for  next  process 

call  csend (5, end,msglen, -1, pid) 

close (unit=10) 

call  killcube  (-1,-1) 

Write  results  to  external  file 

open (unit=ll,  f ile=' /usr/phipps/out4' , form=' unformatted' ) 
do  22  i=l, isat-1 
22     write(ll,*)  (sat  (  j,i)  ,  j-1,49) 
close (unit=ll) 

stop 
end 
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Node  Program: 


program  P3T4n 

*  This  node  program  is  a  parallel  code  for  satellite 

*  position  prediction  using  the  NAVSPASUR  model.   This 

*  program  employs  four  nodes  using  the  control 

*  decomposition  strategy.   The  specific  tasks  for  each 

*  node  are  separated  by  logical  if  statements. 

implicit  real*8  (a-h,m-z) 
real*8  kz 

*  Declare  cube  specific  function  and  variables  as  integers 

integer  mynode , mclock , magwait , myhost 
integer  mynod, pid,msglen, hostid 

common  /ppt/f  (25)  ,osc(9)  ,u(3)  ,v(3)  ,w(3)  ,vel(3)  ,r,  tm,  kz 

common  /cons/c(25) 

common  /n/theta2, etaO, eta20, esqO 

common  /g/g2, g2p, g3, g4, g5 

common  /crit/t2 

common  /sec/agda, agde, agdi, agdl, agdg, agdh 

data  pid/0/,msglen/8/ 

hostid=myhost () 
mynod=mynode ( ) 

*  Synchronize  nodes 

call  gsync() 

*  Nodes  set  constants  and  receive  data  from  host 

call  consl 

call  crecv (5, f ,msglen*49) 

*  Nodes  continue  to  execute  ppt3  until  catalog  of 

*  satellite  data  is  exhausted 

1100     if (tm.eq.O.OdO)  go  to  1101 

*  Node  2  computes  new  T2  and  sends  to  other  nodes 

if (mynod. eq. 2) then 

call  critincl 

call  csend (mynod, t2,msglen, -1, pid) 
endif 

*  Nodes  execute  respective  portion  of  subroutine  ppt3 

call  ppt3 (mynod) 

*  Node  0  send  results  to  host 

if (mynod . eq . 0) then 

call  csend (99, f ,msglen*4  9, hostid, pid) 
endif 
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*  Repeat  until  catalog  of  satellite  data  is  exhausted 

call  crecv (5, f ,msglen*49) 
go  to  1100 

1101     continue 

*  End  main  program 

stop 
end 

•••A***************************************************** 
subroutine  ppt3 (mynod) 

*  This  subroutine  preschedules  the  tasks  for  each  node 

*  numbered  0  through  3.   Tasks  are  partitioned  by  logical 

*  if  statements 

*  Declare  variables 

implicit  real*8  (a-h,m-z) 
real*8  kz 

*  Declare  cube  specific  function  and  variables  as  integers 

integer  mynode,mclock,msgwait 
integer  mynod, pid,msglen 
integer  msg4 (10) 

dimension  msgl (2) ,msg3 (2) 

*  Define  common  blocks 

common  /ppt/f  (25)  ,osc(9)  ,  u  (3)  ,  v  (3)  ,  w  (3)  ,  vel  (3)  ,  r,  tm,  kz 

common  /cons/c (25) 

common  /n/theta2, etaO, eta20, esqO 

common  /g/g2, g2p, g3, g4, g5 

common  /crit/t2 

common  /sec/agda, agde, agdi, agdl, agdg, agdh 

equivalence  (msgl (1) , osc (6) ) , (msg3 (1) , osc (1) ) 
data  pid/0/,msglen/8/ 


if (mynod. ne . 2) then 
esq0=f (5) *f (5) 
theta2=f (8) *f (8) 
eta20=1.0d0-esq0 
etaO=dsqrt (eta20) 

*000000000000000000000000000000000000000000000000000000000 
if (mynod. eq. 0) then 


*    Post  asynchronous  receive  message  commands 
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msg4 (1) =irecv (2, t2,msglen) 
msg4 (2) =irecv (1, osc (5) ,msglen) 
msg4 (3) =irecv (3, agda,msglen*6) 

*  Recover  a  from  mean  orbital  elements 

call  recover 

*  Send  a  to  all  nodes 

call  csend (mynod, f (13) ,msglen, -1, pid) 

*  Compute  g2,  g2p,  g3, g4, g5 

call  gamma 
hl=tm-f (9) 

*  Compute  osculating  l,e,a 

osc(9)  =  (  (f  (4)  *hl  +  f  (3)  )  *hl  +  f  (2)  )  *hl 

osc  (3)=pie  (osc  (9)+f  (1)  ) 

f  (14)— 4.0d0*f  (13)  *f  (3)  /  (f  (2)  *3.0d0) 

f (16) =f (5) *eta20*f (14) /f (13) 

osc(4)=dminl (dmaxl (0 . OdO,  f (5) +f (16) *hl)  ,  .99999999d0) 

osc (8)=dmaxl (l.OdO, f(13)+f(14)*hl) 

*  Send  1,  e  and  a  to  other  nodes 

call  message (osc (3) , osc (4) , osc (8) , mynod, -1,0) 

*  Compute  sin  I 

f (15)=dsgrt (1 . 0d0-theta2) 

*  Receive  t2  from  node  2 

call  msgwait (msg4 (1) ) 

msg4 (4) =irecv (2,msg3, 2*msglen) 

*  Make  preliminary  computations  for  edll 

tt2=theta2*t2 

pl= (-8 . 0d0*tt2-3 . OdO) *theta2+l . OdO 

p2= (-40 . d0*tt2-ll . OdO) *theta2+l . OdO 

vlel-0  .  125d0*g2p*p2-  (5  ."0d0*g4*pl/  (12  .  0d0*g2p)  ) 

vlel=vlel*f (5) *eta20 

pi- (-24 . 0d0*tt2-9 . OdO) *theta2+l . OdO 

p2-(1.25d0+. 9375d0*esq0) *g5 

vle2=(pl*p2+g3) *eta20*f (15) / (4.0d0*g2p) 

vll2=vle2+3.0d0*eta20*0.15  625d0*esq0*f (15) *g5*pl/g2p 

pl= (-16 . 0d0*tt2-5 . OdO) *theta2+l . OdO 

vle3=pl*f (15) *eta20*esq0*g5*35.0d0/ (-3 . 84d02*g2p) 

*  Receive  g  from  node  1 

call  msgwait (msg4 (2) ) 
msg4 (5) =irecv (1, DE,msglen) 

*  Compute  edll 
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cosg=dcos (osc  (5) ) 
sing=dsin (osc  (5) ) 


pl=4 . 0d0*cosg**3-3 . 0d0*cosg 

p2=2 . Od0*sing*cosg 

edll= (vlel*p2-vll2*cosg-vle3*pl) *etaO 

Receive  cos  f,    sin  f  from  node  2 

call  msgwait (msg4 (4) ) 
msg4 (6) "irecv (2, OS,msglen) 

Compute  ed21 

w22=(1.0d0+osc (4) *osc (1) ) * (2+osc (4) *osc (1) ) /eta20 

pl=(3.0d0-4.0d0*osc(2) **2) *osc(2) * (1 . 0d0-2 . 0d0*sing**2) 

pl=(4.0d0*osc(l) **2-3.0d0) *osc(l) *p2+pl 

p3=(1.0d0-2.0d0*sing**2) *osc (2) +osc (1) *p2 

p4=3.0d0* (1.0d0-theta2) 

p5=-l . OdO+3 . 0d0*theta2 

eta30=eta20*eta0 

ed21= (pi* (w22+l . OdO/3 . OdO) +p3* (1 . 0d0-w22) ) *p4 

ed21= (ed21+osc (2) * (w22+l . OdO) *p5*2 . OdO) *eta30*g2p/-4 . OdO 

Send  ed21  to  node  2 

call  csend (mynod, ed21,msglen, 2, pid) 

Compute  edl  and  a 

edl=edll+ed21 

pl= ( cosg * co sg- sing* sing) * (osc (1) *osc (1) -osc (2)*osc(2))- 
&       p2*2.0d0*osc (1) *osc (2) 

p6=(1.0d0+osc(4) *osc(l) ) **3 

osc (8)=osc (8) * (1.0d0+g2p/eta20*  (p5* (p6-eta30)  + 
&  p4*p6*pl)) 


Receive  sectoral  corrections  from  node  3 

call  msgwait (msg4 (3) ) 

DL=edl+agdl 

osc  ( 8 ) =osc ( 8 ) +agda 

Receive  DE  from  node  1 

call  msgwait (msg4 (5) ) 

msg4 (7) =irecv (l,msgl,msglen*2) 

Compute  final  value  for  e  and  1 

esq=de**2+dl**2 

osc (4) =dsqrt (esq) 

sinl=dsin (osc (3)  ) 

cosl=dcos (osc (3)  ) 

osc  (3)  -=artnq  (DE*sinl+DL*cosl,  DE*cosl-DL*sinl) 

Solve  Kepler' s  eqn 

call  kepler 
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if (kz.ne.O.OdO) cf=c(12) 

eta2=1.0d0-esq 

r=osc (8) *eta2/ (l.OdO+osc (4) *osc (1) ) 

Receive  final  computations  from  other  processors 

call  msgwait (msg4 (7) ) 
call  msgwait (msg4 (6) ) 

Compute  g 

osc  (5) =os-osc (3) -osc (6) 

Compute  position  and  velocity  vectors 

cosg=dcos (osc (5) ) 
sing=dsin (osc (5) ) 
cosh=dcos (osc (6) ) 
sinh=dsin (osc (6) ) 

pl=sing*osc (1) +cosg*osc (2) 
p2=cosg*osc (1) -sing*osc (2) 
sini=dsqrt (1 . OdO-osc (7)  **2) 

u  (1)  «=cosh*p2-sinh*pl*osc  (7) 
u (2) — sinh*p2+cosh*pl*osc (7) 
u (3) =pl*sini 


v (1) =-cosh*pl-sinh*p2*osc  (7) 
v (2) =-sinh*pl+cosh*p2*osc  (7) 
v (3) =p2*sini 

w (1) =sinh*sini 
w (2) =-cosh*sini 
w (3) =osc  (7) 

p3=osc (4) *osc  (2) 
p4=osc (4) *osc (1)+I.0d0 
p5=dsqrt (eta2*osc (8) ) 

do  11  i-1,3 

11     vel  (i)  =  (p3*u(i)+p4*v(i)  ) /p5 
vel(l)=vel  (l)+cf*r*u(2) 
vel(2)-vel(2)  -cf*r*u(l) 

End  for  node  0 


endif 

*00000000000000000000000000000000000000000000000000000000 
*111 11 11 1111111111111111111111111111111111111111111111111 

*    Begin  node  1 

if (mynod. eq. 1) then 


msg4 (1) =irecv (0, f (13) ,msglen) 
msg4 (2) =irecv (2, t2,msglen) 
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msg4 (3) =irecv (3, adga, msglen*6) 

Compute  sin  i  and  tan  i 

f (15) =dsqrt (1 . 0d0-theta2) 

tani=f (15) /f (8) 

hl=tm-f (9) 

pl=l . OdO-5 . 0d0*theta2 

p2=-35 . OdO+24 . 0d0*eta0+25 . OdO*eta20 

p3=90 . OdO-192 . 0d0*eta0-12  6 . 0d0*eta20 

p4=385 . OdO+3  60 . 0d0*eta0+4  5 . 0d0*eta20 

p5=-270 . OdO+12  6 . 0d0*eta20 

p6=385.0d0-18  9.0d0*eta20 

theta4=theta2*theta2 

Receive  a  from  node  0 

call  msgwait (msg4 (1) ) 
Compute  g2,  g2pr g3, g4, g5 

call  gamma 

Compute  g 

osc(9)  =  (  (f  (4)  *hl+f  (3)  )  *hl  +  f  (2)  )  *hl 

f (11) =-1 . 5d0*g2p*pl+ .  09375d0*g2p**2* (p2+p3*theta2 
&         +p4*theta4)+.3125d0*g4* (21 . OdO-9 . 0d0*eta20+p5*theta2 
&         +p6*theta4) 

ql=f (2) *f  (13) **1.5d0 

f  (ll)=f  (ll)/ql 

osc(5)=pie(f  (6)+f  (11)  *osc(9)  ) 

Send  g  to  all  nodes 

call  csend (mynod, osc (5) ,msglen, -1, pid) 

Complete  preliminary  computations  to  solve  for  die  and  dli 

cosg=dcos  (osc  (5) ) 

sing=dsin (osc  (5) ) 

p3= (3 . OdO-4 . 0d0*sing**2) *sing 

p4=2 . OdO*cosg**2-l . OdO 

p5=2 . Od0*sing*cosg 

p6=cosg*cosg- sing* sing 

Receive  t2  from  node  2 

call  msgwait (msg4 (2) ) 

msg4 (4) =irecv (2, msg3,msglen*2) 

tt2=theta2*t2 

pl= (-8 . 0d0*tt2-3 . OdO) *theta2+l . OdO 

p2= (-40 . d0*tt2-ll . OdO) *theta2+l . OdO 

vlel=0.125d0*g2p*p2- (5 . OdO*g4*pl/ (12.0d0*g2p) ) 

vlel=vlel*f (5) *eta20 

pl= (-24 . 0d0*tt2-9. OdO) *theta2+l . OdO 

p2=(1.25d0+.9375d0*esq0) *g5 

vle2-(pl*p2+g3) *eta20*f (15) / (4.0d0*g2p) 

pl=(-16.0d0*tt2-5.0d0) *theta2+l . OdO 

vle3=pl*f (15) *eta20*esg0*g5*35 . OdO/ (-3 . 84d02*g2p) 
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Compute  die  and  dli 

dle=vlel*p4+vle2*sing+vle3*p3 
dli=-f (5) *dle/ (eta20*tani) 

Receive  1,  e  and  a  from  node  0 

call  message (osc (3) , osc (4) , osc (8) ,mynod, 0,1) 
Receive  cos  f,  sin  f  and  r  from  node  2 

call  msgwait (msg4 (4) ) 

Compute  d2i 

pl=(4.0d0*osc(l) **2-3.0d0) *osc (1) *p4 

pl=pl-p5* (3.0d0-4.0d0*osc (2) **2) *osc(2) 

p2=p4*osc (1) -p5*osc (2) 

p3=(osc (1) *osc (1) -osc (2) *osc (2) ) *p6-2 . 0d0*p5*osc (1) *osc(2) 

d2i=( (pl+3.0d0*p2) *f (5) +3 . 0d0*p3) *f (15) *f (8) *g2p 

Compute  d2e 

w20=osc (1) * (3.0d0+osc (4) *osc (1) * (3.0d0+osc (4) *osc (1) ) ) 

p4=eta0+l .OdO/ (1 . OdO+etaO) 

p5=1.0d0-theta2 

d2e=0.5d0*g2p* ( (3 . 0d0*theta2-l . OdO) * (w20+f (5) *p4) 
&       +3.0d0*p5* (w20+f (5) ) *p3-eta20*p5* (3 . 0d0*p2+pl) ) 

Compute  di  and  de 

di=dli+d2i 
de=dle+d2e 

Receive  sectoral  corrections 

call  msgwait (msg4 (3) ) 

Compute  DE  and  send  to  node  0 

DE=de+osc (4) +agde 

call  csend (mynod, DE,msglen, 0, pid) 

Compute  DI 

pl=(f (8)+1.0d0) /2.0d0 
p2=dsqrt (pi) 
p3=dsqrt (l.OdO-pl) 
DI=p3+.5dO*p2* (di+agdi) 

Receive  osc (6)  and  DH  from  node  3 

call  message (osc (6) r DH, dummy, mynod, 3,1) 

Compute  final  cos  i  and  h  and  send  to  node  0 

osc (7)=1.0d0-2.0d0*  (DI**2+DH**2) 

sinh=dsin (osc  (6) ) 

cosh=dcos (osc  (6) ) 

osc (6)  =artnq (DI*sinh+DH*cosh, DI*cosh-DH*sinh) 
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call  csend (mynod, msgl,msglen*2, 0, pid) 

*  End  of  node  1 

endif 

*111111111111111111111 1111111 111 11 11111111111111111111111 
*33333333333333333333333333333333333333333333333333333333 

*  Begin  node  3 

if (mynod. eq. 3) then 

msg4 (1) =irecv (0, f (13) ,msglen) 
msg4 (2) =irecv (2, t2,msglen) 
msg4 (3) =irecv (1, osc (5) ,msglen) 

sini2=l . 0d0-theta2 

f (15) =dsqrt (sini2) 

hl-tm-f (9) 

osc(9)=hl*  (f  (2)+hl*  (f  (3)+hl*f  (4)  )  ) 

theta4=theta2*theta2 

pl=l . OdO-5 . 0d0*theta2 

p2=-35 . OdO+24 . 0d0*eta0+25 . 0d0*eta20 

p3=90 . OdO-192 . 0d0*eta0-12  6 . 0d0*eta20 

p4=385 . OdO+360 . 0d0*eta0+45 . OdO*eta20 

p5=-270 . OdO+12  6 . 0d0*eta20 

p6=385 . OdO-18  9 . 0d0*eta20 

p7=-5.0d0+12.0d0*eta0+9.0d0*eta20 

p8=35 . OdO+36. 0d0*eta0+5 . 0d0*eta20 

p9=3 . OdO-7 . OdO*theta2 

*  Receive  a  from  node  0 

call  msgwait (msg4 (1) ) 

*  Compute  g2,  g2p, g3, g4,  g5 

call  gamma 

*  Compute  h 

f (12)=(-3.0d0*g2p+.375d0*g2p**2 
&         * (p7-p8*theta2)+1.25d0*g4* (5 . OdO-3 . 0d0*eta20) *p9) *f (8) 
ql-f (2) *f (13) **1.5d0 
f  (12)=f  (12)/ql 
if (kz.ne.O.OdO) cf=c(12) 

osc(6)=pie(f  (7)+f  (12)  *osc(9)  ) -cf  *  (tm-c(l)  ) 

*  Send  h  to  node  2 

call  csend (mynod, osc (6) ,msglen, 2, pid) 

*  Perform  preliminary  computations  for  sectoral  corrections 

f (Il)=-1.5d0*g2p*pl+.0  9375d0*g2p**2* (p2+p3*theta2 
&         +p4*theta4) +.3125d0*g4* (21 . OdO-9 . 0d0*eta20+p5*theta2 
&         +p6*theta4) 

f  (ll)=f  (ll)/ql 

agda=10.0d0 
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call  sector (hi) 

Receive  t2  from  node  2 

call  msgwait (msg4 (2) ) 

msg4 (4) =irecv (2,msg3,msglen*2) 

Complete  preliminary  computations  to  compute  sini  dlh 

tt2=theta2*t2 

pi- (40 . 0d0*tt2+16 . OdO) *tt2+3 . OdO 

p2= (2 . 0d02*tt2+80 . OdO) *tt2+ll . OdO 

vlhli=(5.0d0*g4*pl/ (12.0d0*g2p) - . 125*g2p*p2) *f (15) *esq0*f (8) 

p2=4 . OdO+3 . 0d0*esq0 

p3= (-24 . 0d0*tt2-9 . OdO) *theta2+l . OdO 

vlh2i=( (f (5) *f (8) *.25d0) /g2p) * (g3+ (5 . 0d0*g5/16 . OdO) *p2*p3 
&+15.0d0*g5* (f (15) **2) *p2*pl/8 . OdO) 

p2= (-8 . 0d0*tt2-2 . 5d0) *theta2+0 . 5d0 
p3=2.0d0*pl-1.0d0 

vlh3i=(-35.0d0*g5*esq0*f (5) *f (8) ) * (p2+p3*f  (15) *f  (15) ) 
&/ (576.0d0*g2p) 

Receive  1,  e  and  a  from  node  0 

call  message (osc (3) , osc (4)  , osc (8) ,mynod, 0, 1) 
Receive  g  from  node  1 

call  msgwait (msg4 (3) ) 

Compute  sectoral  corrections  and  send  to  all  nodes 

agda=0 . OdO 
agde=0 . OdO 
agdi=0.0d0 
agdl=0.0d0 
agdg=0 . OdO 
agdh=0 . OdO 

call  sector (hi) 

call  csend (mynod, agda,msglen*6,  -1,  pid) 

Compute  sini  dlh 

cosg=dcos (osc  (5) ) 
sing=dsin (osc (5) ) 
pl=2 . Od0*sing*cosg 
p2=(4.0d0*cosg**2-3.0d0) *cosg 
sinidlh=vlhli*pl+vlh2i*cosg+vlh3i*p2 

Receive  cosf,  sinf  from  node  2 

call  msgwait (msg4 (4) ) 

Compute  sini  d2h 

wl7=artnq (osc (2) , osc (1) ) +osc (4) *osc (2) -pie (osc (3)  ) 
p2=2 . 0d0*cosg**2-l . OdO 
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p3=(4.0d0*osc(l) **2-3.0d0) *osc(l) 

p4=(3.0d0-4.0d0*osc(2) **2) *osc(2) 

p5=pl* (6.0d0*osc(l) **2-3.0d0)+6.0d0*p2*osc(l)  *osc(2) 

p6=(pl*osc(l) +p2*osc(2) ) *3.0d0 

w21=p5+ (p6+pl*p3+p2*p4) *osc  (4) 

sinid2h=-0.5d0*g2p*f (8) *f (15) * (6 . 0d0*wl7-w21) 


Compute  DH  and  send  osc(6)  and  DH  to  node  1 

DH=0.5dO* (sinidlh+sinid2h+agdh) /dsqrt (0 . 5d0+0 . 5d0*f (8)  ) 
call  message (osc(6),DH,0. 0d0,mynod, 1,0) 

End  of  node  3 

endif 


*33333333333333333333333333333333333333333333333333333333 

else 
*222222222222222222222222222222222222222222222222222222222 

*  Begin  node  2 

msg4 (1) =irecv (0, f (13) , msglen) 
msg4 (2) =irecv (1, osc (5) ,msglen) 
msg4 (3) =irecv (3, osc (6) ,msglen) 

*  Perform  preliminary  computations  for  dlz 

f (15)=dsqrt (1 . 0d0-theta2) 

tt2=theta2*t2 

esq0=f (5) *f  (5) 

eta20=1.0d0-esq0 

etaO=dsqrt (eta20) 

eta30=eta20*eta0 

*  Receive  a  from  node  0 

call  msgwait (msg4 (1) ) 

*  Compute  g2,g2p,g3,g4  and  g5 

call  gamma 

*  Compute  dlz 

pl= (eta30-l . OdO) * . 125d0 

p2=( (-40.0d0*tt2-11.0d0) *theta2+l . OdO) *g2p 

p3=10.0d0*g4/ (3.0d0*g2p) 

p4=( (-8.0d0*tt2-3.0d0) *theta2+l) *p3 

p5=(20.0d0*tt2+8.0d0) *tt2+1.0d0 

p6= (10 . 0d0*p5+l . OdO) *g2p 

p7= (2 . 0d0*p5+l . OdO) *p3 

p8=25.0d0*esq0*tt2*tt2*theta2* (g2p- . 2d0*p3) 

p9=( (-2.0d02*tt2-33.0d0) *theta2+l . OdO) *g2p 

pl0=( (-40.0d0*tt2-9.0d0) *theta2+l . OdO) *p3 

vlsl=pl* (p2-p4) -.125d0*esq0*f (8) * (p6-p7) 
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&        +p8-.0625d0*esq0* (p9-pl0) 

pl=etaO+1.0dO/ (l.OdO+etaO) 

p2=f (8) / (l.OdO+f (8) ) 

p3=4 . OdO+3 . OdO*esqO 

p4= (-24 . 0d0*tt2-9 . OdO) *theta2+l . OdO 

p5= (esq0-eta30) *3 . OdO+11 . OdO 

p6= (20 . 0d0*tt2+8 . OdO) *tt2+l . OdO 

p7=(pl+p2) /4.0d0 

p8-(g3+.3125d0*p3*g5*p4) /g2p 

p9=p5* . 15625d0*g5*p4/g2p 

plO=(1.0dO-f (8) ) *.4  6875d0* (p6*2 . OdO+1 . OdO) *p3*g5*f (8) /g2p 

vls2=(p7*p8+p9+pl0) *f (5) *f (15) 

p7=3 . 0d0*eta30-3 . OdO- (2 . OdO+p2) *esqO 

p8= (-16 . 0d0*tt2-5 . OdO) *theta2+l . OdO 

p9=(1.0d0-f (8) ) *f (8) *. 0  607 6388 4d0*esq0* (1 . OdO+4 . 0d0*p6) 

vls3=(p7*.030381944d0*p8-p9) *f (15) *f (5) *g5/g2p 

Receive  g  from  node  1  and  compute  dlz 

call  msgwait (msg4 (2) ) 

sing=dsin  (osc (5) ) 

cosg=dcos  (osc (5) ) 

pll=2 . Od0*sing*cosg 

pl2=2 . 0d0*cosg**2-l . OdO 

pl3= (cosg*cosg-sing*sing) *cosg-2 . Od0*cosg*sing*sing 

dlz=vlsl*pll+vls2*cosg+vls3*pl3 

Receive  1,  e,  and  a  from  node  0 

call  message (osc (3) , osc (4) , osc (8) ,mynod, 0,1) 
msg4 (4) =irecv (0, ed21,msglen) 

Solve  Kepler' s  equation  and  send  cosf ,  sinf  to  all  nodes 

call  kepler 

call  csend (mynod,msg3,msglen*2,  -1,  pid) 

Complete  preliminary  computations  for  d2z 

wl7=artnq  (osc(2)  ,osc(l)  )+osc(4)  *osc(2)  -pie  (osc  (3)  ) 
p3=(4.0d0*osc(l) **2-3.0d0) *osc(l) 
p4=(3.0d0-4.0d0*osc(2) **2) *osc(2) 

p5=pll* (6.0d0*osc(l) **2-3.0d0)+6.0d0*pl2*osc(l) *osc(2) 
p6=(pll*osc(l)+pl2*osc(2) ) *3.0d0 
w21=p5+ (p6+pll*p3+pl2*p4) *osc (4) 
p7=(-5.0d0*theta2+2.0d0*f (8) +3. OdO) *w21 
p8=(-5.0d0*theta2+2*f (8)+1.0d0) *wl7 

Receive  ed21  from  node  0 

call  msgwait (msg4 (4) ) 
Compute  d2z 

d2z=-f (5) *ed21* (pl-l.OdO) /eta30- (6 . 0d0*p8-p7) *g2p*0.25d0 


102 


*  Receive  h  form  node  3 

call  msgwait (msg4 (3) ) 

*  Receive  sectoral  corrections 

call  crecv (3, agda,msglen*6) 

*  Compute  OS 

OS=osc (3) +osc (5) +osc (6) +dlz+d2z+agdg 

*  Send  OS  to  node  0 

call  csend (mynod, OS,msglen,  0,  pid) 

*  End  of  node  2 

*22222222222222222222222222222222222222222222222222222222 
endif 

*  Return  to  main  program 

return 

end 

********************************************************* 

function  pie (x) 

*  This  double  precision  function  computes  value  of 

*  x  mod  2*pi 

implicit  real*8  (a-h,m-z) 

pi2=6.2831853071d0 
pie=dmod (x, pi2) 
if  (pie)  90,91,91 

90  pie«=pie+pi2 

91  return 

end 
********************************************************** 

function  artnq(tl,t2) 

*  This  double  precision  function  computes  inverse  tangent  of  tl/t2 

*  for  range  0  to  2*pi 

implicit  real*8  (a-h,m-z) 

if (dabs (tl) -dabs (t2) ) 100, 104, 104 

100  artnq=datan(tl/t2) 

if (t2) 101, 102,102 

101  artnq=artnq+3 . 14159265358979d0 
go  to  105 

102  if  (tl) 103, 105,105 

103  artnq=artnq+6. 2 8318530717 95 9d0 
go  to  105 
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104  artnq=atan(-t2/tl) +1.5707 963267948 9d0 

if  (tl)  101,105,105 

105  return 

end 
*******  *********************************************  ****** 

subroutine  recover 

*  This  subroutine  recovers  the  value  of  a" 

*  iteratively  from  m 

implicit  real*8  (a-h,m-z) 
real*8  kz 

common  /ppt/f  (25)  ,osc(9)  ,u(3)  ,v(3)  ,  w  (3)  ,vel  (3)  ,r,  tm,  kz 

common  /cons/c(25) 

common  /n/theta2, eta, eta2, esq 

f (13) =f (2) ** (-2.0d0/3.0d0) 


do  110  i=l,5 

g2p=c(3) / (f (13) *eta2) **2 

g4=c(5) / (f (13) *eta2) **4 

pl= ( (35 . 0d0*theta2) -30 . OdO) *theta2+3 . OdO 

p2=25 . 0d0*eta2+144 . 0d0*eta+105 . OdO 

p3=-90 . 0d0*eta2-96 . 0d0*eta+30 . OdO 

p4=25 . OdO*eta+16 . 0d0*eta-15 . OdO 

p5=3 . 0d0*theta2-l . OdO 

110    f (13) - ( (1 . 0d0+l . 5d0*g2p*eta*p5+ . 09375d0*g2p**2 

&         * (p4+theta2* (p3+p2*theta2) ) +. 9375d0*g4*eta*esq*pl) 
&         /f (2) ) ** (2.0d0/3.0d0) 

return 
end 
********************************************************** 

subroutine  gamma 

*    This  subroutine  computes  the  dimensionless  quantities 

implicit  real*8  (a-h,m-z) 
real*8  kz 

common  /ppt/f  (25)  ,osc(9)  ,u(3)  ,v(3)  ,w(3)  ,vel(3)  ,r,  tm,  kz 

common  /cons/c(25) 

common  /n/theta2, etaO, eta20, esqO 

common  /g/g2, g2p, g3, g4, g5 

g2=c(3)  /f  (13)  **2 
g2p=g2/eta20**2 
g3=c(4) / (f (13) *eta20) **3 
g4=c(5) / (f  (13) *eta20) **4 
g5=c(6) / (f (13) *eta20) **5 

return 
end 
********************************************************** 
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subroutine  critincl 

*  This  subroutine  computes  the  value  of  T2 

implicit  real*8  (a-h,m-z) 
real*8  kz 

common  /ppt/f  (25)  , osc (9)  ,u(3)  ,v(3)  ,w(3)  , vel  (3)  ,r,  tm,  kz 
common  /n/theta2, etaO, eta20, esqO 
common  /crit/t2 

theta2=f (8) *f (8) 

tl-1 . OdO-5 . 0d0*theta2 

beta=1.0d02/ (2.0d0**ll) 

p3=beta*tl*tl 

pl=dexp (-p3) 

p2=1.0d0+pl 

plex=pl*pl 

p4=1.0d0 

p3ex=-0.5d0*p3 

do  210  i«2,13 

if  (i.le.ll)p2=p2* (l.OdO+plex) 
plex=plex*plex 
p4=p4+p3ex 
210      p3ex=-p3*p3ex/dble (i+1) 

t2=p2*p4*beta*tl 

return 

end 
********************************************************** 

subroutine  kepler 

*  This  subroutine  solves  Kepler's  Equation  using  Stef fenson' s 

*  Method  and  then  computes  cos  f  and  sin  f 

implicit  real*8  (a-h,m-z) 
real*8  kz 

common  /ppt/f  (25)  ,osc(9)  ,u(3)  ,v(3)  ,w(3)  fvel(3)  ,r,  tm,  kz 

e3=osc (3) 

do  410  i=l,20 
el=e3 

e3=osc (3) +osc (4) *dsin(e3) 
if (dabs(e3-el) .lt.l.0d-08)  go  to  420 
e2=e3 

e3=osc (3) +osc (4) *dsin (e3) 
if (dabs(e3re2) .lt.l.0d-08)  go  to  420 
410     e3=e3+ (e3-e2) **2/ (2 . 0d0*e2-el-e3) 

420     cosf=dcos (e3) 

el=l . OdO-osc (4) *cosf 
osc(l)  =  (cosf-osc(4))/el 
eta=dsqrt (1 . OdO-osc (4) **2) 
osc  (2)  =  (eta*dsin(e3)  )  /el 

return 
end 
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••••♦••••••••••••••••A************************************ 
subroutine  sector (hi) 


This  subroutine  computes  sectoral  correction  terms  to  be  added 
to  the  variation  of  each  of  the  orbital  elements 


implicit  real*8  (a-h,m-z) 

real*8  kz 

dimension  elm  (8)  ,slm(8)  ,tf(8),tfp(8),te(8,8) 

common  /ppt/f  (25)  ,osc  (9)  ,u(3)  ,v(3)  ,  w  (3)  ,  vel  (3)  ,  r,tra,kz 

common  /cons / c (25) 

common  /n/theta2, eta, eta2, esq 

common  /sec/agda, agde, agdi, agdl, agdg, agdh 

if (agda.ne.l.OdOl)go  to  330 


clm(l 
elm  (2 
elm  (3 
elm  (4 
elm  (5 
elm  (6 
elm  (7 
elm  (8 

slm(l 
slm(2 
slm(3 
slm  (4 
slm  (5 
slm  (6 
slm  (7 
slm  (8 


0.35961113d-07 
=0.18246786d-05 
=0.22207398d-05 
=clm(3) 

=0.37098633d-06 
=clm(5) 

=0.22580118d-06 
=clm(7) 

=0.24286442d01 

=0.57568525d01 

=0.12107857d00 

=slm(3) 

=0.96916794d0 

=slm(5) 

=0.11169656d01 

=slm(7) 


te 

(7,1) 

te 

(7,2) 

te 

(7,3) 

te 

(7,4) 

te 

(7,5) 

te 

(7,6) 

te 

(7,7) 

te 

(7,8) 

te 

(8,1) 

te 

(8,2) 

te 

(8,3) 

te 

(8,4) 

te 

(8,5) 

te 

(8,  6) 

te 

(8,7) 

te 

(8,8) 

=0.0d0 

=0.0d0 

=1.0d0 

— l.OdO 

=1.0d0 

— l.OdO 

-l.OdO 

— l.OdO 

-l.OdO 

=2.0d0 

-l.OdO 


OdO 
OdO 


=2. OdO 
=3. OdO 
=3. OdO 


feta=l .OdO+eta 
fl-l.OdO+f (8) 
f2=1.0d0-f (8) 
f3=1.0d0+3.0d0*f (8) 
f4=1.0d0-3.0d0*f (8) 
fl52=f (15) *f (15) 
rl=f (2) *f (11) 
r2=f (2) *f (12) -c (12) 

tf (1)=-I.5d0*f (8) *f (15) 
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tf (2)=1.5d0*fl52 

tf (3)=0.9375d0*fl52*f3-0.75d0*fl 

tf (4)=0.9375d0*fl52*f4-0.75d0*f2 

tf (5)=1.875d0*f (15) *fl*f4 

tf (6) — 1.875d0*f (15)*f2*f3 

tf (7)=5.625d0*fl52*fl 

tf (8) =5 . 625d0*f 152*f 2 


tfp(l)— 1.5d0*  (f  (8)  **2-fl52) 
tfp(2)-3.0d0*f  (8)  *f  (15) 

tfp(3)=f  (15)  *  (1.875d0*f  (8)  *f3-2 . 8125d0*f 152+0 . 75d0) 
tfp(4)=f (15) * (1.875d0*f (8) *f 4+2 . 8125d0*f 152-0 . 75d0) 
tfp(5)=fl* (1.875d0*f (8) *f4+3.75d0*fl52) 
tfp(6)=f2* (1.875*f (8) *f3-3.75d0*fl52) 
tfp(7)=f (15) * (11.25d0*f (8) *fl-5.625d0*fl52) 
tfp(8)=f (15) * (11.25d0*f (8) *f2+5.625d0*fl52) 


ta-l.OdO/f  (2) /f  (13) **5 

flpl=6.0d0 

tg=l . 0d0/eta2/eta 

tgp=3.0d0*f (5) *tg/eta2 

do  300  i=l,8 

if (i-3)301, 302,301 

302        flpl=8.0d0 

tg-=tgp/3  .OdO 

tgp=(1.0d0+4.0d0*esq) / (eta2**2*eta2*eta) 

ta=ta/f (13) 

301        tai=ta*clm(i) / (rl*te (7,i)+r2*te  (8,i) ) 
te(2f  i)=-tai*tf  (i)  *tg*eta*te  (7,  i)  /f  (5) 
te(3,i)=tai*tf  (i)  *  (te(7,i)  *f  (8)  -te  (8,  i)  )  /f  (15)  *tg/eta 
te(4,i)=tai*  (flpl*f  (5)  *tg-eta2*tgp)  *tf  (i) 
te (5,i)=tai* (tf (i) *f (5) *eta/feta*tgp+flpl*tf (i) *tg 
&       +f  (15)  /fl*tfp(i)  *tg/eta) 

300     te (6, i)=tai*tfp (i) *tg/eta 

go  to  340 
330     the=osc(6) 

if (kz.eq.0.0d0)the=the-c(12) * (f (9) +hl-c (1) ) 

do  341  i=l,8 

3int=osc (5) *te (7,i) +the*te (8,i)  -slm(i) 
cost=dcos (sint) 
sint=dsin (sint) 
agde=agde+te (2, i) *cost 
agdi=agdi+te (3, i) *cost 
agdl=agdl+te (4, I) *sint 
agdg=agdg+te (5, i) *sint 
341     agdh=agdh+te (6, i) *sint 

340     return 

end 
********************************************************** 

subroutine  message  (dl,  d2,  d3,mynod,.  dest ,  itype) 

*  This  subroutine  is  used  to  join  to  disjoint  variables 

*  into  a  single  array  to  be  sent  or  received  in  one  message 
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implicit  real*8  (a-h,m-z) 

dimension  msg2 (3) 

integer  mynod, dest , pid, itype,msglen 

data  pid/0/,msglen/8/ 


if (itype . eq. 0) then 

msg2 (l)=dl 

msg2  (2)=d2 

msg2  (3)=d3 

call  csend (mynod,msg2,msglen*3, dest, pid) 
else 

call  crecv (dest,msg2,msglen*3) 

dl=msg2 (1) 

d2=msg2 (2) 

d3=msg2 (3) 
endif 


return 
end 
********************************************************* 

subroutine  consl 

*  This  subroutine  is  used  by  NAVSPASUR  to  set  the 

*  constants  used  by  the  satellite  motion  model 

implicit  real*8  (a-h,m-z) 

common  /cons/c(25) 

BETA=398597. 62579588d0 
ERKM=6378.135d0 
FLAT=2  98.2  6dO 


K-TERMS 


C20=-0.4841605d-03 

C30-0.95958d-06 

C40=0.55199d-06 

C50=0.65875d-07 

c(3)=-0.5dO*C20*dsqrt (5.0d0) 

c(4)=C30*dsqrt (7.0d0) 

c (5)=0.375dO*C40*dsqrt (9.0d0) 

c(6)=C50*dsqrt (ll.OdO) 

TWO  PI 

c(7)=6.283185307179586d0 
c(16)=1.0d0/c(7) 

HERGS/DAY    ,     SECS/HERG,     MINS/HERG 

c ( 9) =ERKM*dsqrt (ERKM/BETA) 

c(8)=86400.0d0/c(9) 

c(17)=1440.0d0/c(8) 
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WE (RAD/DAY),  WE-  2  PI,  WE (RAD/HERG) 

c (11) =6. 3003880 987d0 
c(10)=c(ll)-c(7) 
c(12)=c(ll)  /c(8) 

EARTH  FLATTENING 

c  (13)  =  (2 . 0d0-l . OdO/FLAT) /FLAT 

SM/ER,  KM/ER,  NM/ER 

c (20) =ERKM/1 . 609344d0 

c  (21)=ERKM 

c (22) =ERKM/1 . 852d0 

DEG/RAD 

c(23)=360.0d0/c(7) 

RANGE  RATE/ER/HERG  TO  CYCLES/SECOND  -  CONVERSION 
c(24)=c(21) *216.980d+0  6/ (c(9) *2 . 997  925d+05) 

FENCE  PLANE  DISPLAC  FROM  EARTH  CENTER 
c(25)=0.31000d-02 


return 

end 
********************************************************* 
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APPENDIX  D 


P3T  SOURCE  CODE  LISTING 


This  appendix  contains  the  listing  of  host  and  node 
programs  comprising  P3T  program  set.  The  host  program  loads 
the  node  program  and  clears  the  nodes  once  the  process  is 
complete.  The  node  program  contains  the  instructions  for  the 
nodes  to  complete  their  respective  portions  of  the  P3T 
parallel  algorithm.  The  node  program  assigns  Node  0  to  be  the 
distributing  node,  the  highest  numbered  node  to  be  the 
collecting  node,  and  the  remaining  nodes  to  be  the  working 
nodes .  The  working  nodes  execute  the  original  NAVSPASUR 
subroutine  PPT2  with  only  minor  modifications . 

Host  Program: 

program  P3th 

*  This  host  program  loads  the  node  program  P3tn  on  the 

*  nodes  of  the  attached  hypercube.   Upon  completion  of 

*  the  catalog  of  satellite  data,  program  clears  nodes 

*  for  another  process. 

*  Set  host  specific  parameters 

data  pid/0/ 

*  Set  process  id 

call  setpid(pid) 

*  Load  program  P3tn  on  the  nodes 

print*, ' loading  nodes' 
call  load('p3tn'  ,  -l,pid) 

*  Receive  message  that  nodes  are  complete 

call  crecv (99, istop, 4) 
print*, ' nodes  complete' 

*  Kill  process  on  nodes 
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call  killcube (-1,-11 

stop 
end 
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Node  Program: 

program  P3Tn 

*  This  program  propagates  n-2  satellites  concurrently, 

*  where  n  is  the  number  of  nodes  belonging  to  the 

*  attached  cube.   Node  0  is  the  distributing  node  and 

*  Node  n-1  is  the  collecting  node.   The  remaining  nodes 

*  are  the  working  nodes  that  propagate  the  satellites 

*  using  NAVSPASUR  subroutine  PPT2.   For  simplicity, 

*  the  tasks  for  all  nodes  are  combined  on  this  one  node 

*  program.   Tasks  are  partitioned  by  logical  if  statements. 

implicit  real*8  (a-h,o-z) 
real*8  kf(10) 

integer  mynod, numnode, hostid,  pid 
integer  mynode, numnodes,myhost,mclock 
integer  eof 

dimension  ar (6, 8) , br (3, 6) , cr (3, 6) , dv (3) 

common/cons/a (64) 

common/ppt/f  (25)  ,osc(10)  ,kf(10),cf(10),bs(3,4),u(3),v(3),w(3),r, 
&     vel (3) , dind, tmf dkz, dident 

common/dcsub/pe  (6,8),e(8,8),ep(8,8),g(8),gp(8)  ,ifti  (8)  ,  if  to  (8)  , 
&     iteri, itero, jof , jol, stat (20) , tol (6) , iw, of (11) , ow (8, 8) 

common/f oreo/rho (3) , ros, hdr, hdv, rdv, del, iter 

common/bloc/sat (84, 4800) 

data  istop/1/, pid/0/,msglen/672/ 
data  isat/l/,n/l/ 

mynod=mynode ( ) 
numnode=numnodes () 
hostid=myhost () 

*000000000000000000000000000000000000000000000000000000 

*  Node  0  reads  and  distributes  data  among  the  working 

*  nodes 

i  f (mynod . eg . 0 ) t  hen 

*  Read  complete  catalog  of  satellite  data 

open  (unit=10,  f ile='  /usr/phipps/in'  ,  form='  unformatted'  ) 
read (unit =10, iostat=eof ) (sat ( j, 1) , j=l, 84) 

1200        if (eof .ge.0) then 

isat=isat+l 

read (unit =10, iostat=eof ) (sat (j,isat) , j=l, 84) 
go  to  1200 
endif 
close (unit=10) 

*  Send  number  of  data  sets  to  all  nodes 

t0=mclock ( ) 
isat=isat-l 
call  csend (mynod, isat,  4, -1,  pid) 
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*  Distribute  satellite  data  sets  to  working  nodes 

iter=isat/ (numnode-2) 

do  1201  j=l,iter 

do  1201  i=l, numnode-2 

call  csend (mynod, sat (1, n) , msglen,  i,  pid) 

1201  n=n+l 

iter«=mod  (isat,  numnode-2) 

n—n-1 

do  1202  j-l,iter 

1202  call  csend (mynod, sat (1, n+j) , msglen, j, pid) 

*  End  Node  0 

*00000000000000000000000000000000000000000000000000000 
*wwwwwwwwwwwvntfwwwwwwwwwwwww^ 

*  Begin  working  nodes 

else 

if (mynod. It .numnode-1) then 

*  Use  subroutine  consl  to  set  constants 

call  consl 


Receive  number  of  data  sets  from  Node  0  and  compute  total  number 
of  satellites  to  propagate 


call  crecv (0, isat,  4) 

t0=mclock  () 

if (mynod. le .mod (isat, numnode-2) ) then 

iter— isat/ (numnode-2) +1 
else 

iter=isat/ (numnode-2) 
endif 

*  Receive  satellite  data,  execute  PPT2,  and  send  results  to 

*  collecting  node 

do  1220  i«=l,iter 

call  crecv (0, f, msglen) 

*  Set  parameters  for  subroutine  ppt2 

ind=l 
kz=idint (dkz) 

*  Compute  secular  recovery 

call  ppt2(ind,kz) 

Compute  subsequent  task,  ie .  predict  position,  update  elements 

ind=idint (dind) 
call  ppt2(ind,kz) 

1220     call  csend (mynod, f, msglen, numnode-1, 0) 
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*  End  Working  Nodes 
*WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 

•cccccccccccccccccccccccccccccccccccccccccccccccccccccc 

*  Begin  Collecting  Nodes 

else 

*  Receive  total  number  of  satellite  sets 

call  crecv (0, isat, 4) 

*  Collect  results  from  Working  Nodes 

do  1230  i=l,isat 

1230  call  crecv (-1, sat (1, i) ,msglen) 

*  Write  results  to  external  file 

open (unit=ll, f ile=' /usr/phipps/ouf , form=' unformatted' ) 
do  1231  i=l,isat 

1231  write (unit=ll, *) (sat ( j, i) , j=l, 84) 
close (unit=ll) 

*  Send  message  to  Host  that  process  is  complete 

call  csend (99, ist op, 4, host id, pid) 

*  End  Collecting  Node 

*cccccccccccccccccccccccccccccccccccccccccccccccccccccc 

endif 
endif 

stop 
end 
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